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METHODOLOGY IN PSYCHOLOGY* 


JoHN C. FLANAGAN 


AMERICAN INSTITUTE FOR RESEARCH 
UNIVERSITY OF PITTSBURGH 


Introduction 


An occasion of this sort seems a good time to take stock of the aims 
and methods of the members of this Society in their professional activities. 
Too often, even in a selected group of this type, a casual observer might get 
the impression that the aims of the members were primarily to prepare and get 
published a respectable number of journal articles with an occasional chapter 
in a book, and now and then a major publication of some type. Without 
wishing to lay claim to having made a research study in motivation, casual 
observation suggests that at least some of this activity is undertaken in the 
hope that individuals will be moved up to the next rung on the ladder or 
offered a more attractive position in some other locality. 

In spite of such appearances it seems certain that anyone who has 
listened closely to the meetings both formal and informal during the past 
week would be forced to the conclusion that the individual members of this 
Society have a genuine interest in achieving more basic goals than those just 
mentioned. 

As a basic factor to be considered in such a review as I propose, it is 
of importance that the overwhelming majority of the members of the Psy- 
chometric Society are individuals who regard themselves as primarily psy- 
chologists. It is true that they are specialists and a number of them have had 
special training in mathematics. But considered one by one as we run down 
the membership list, the impression seems unmistakable that their goals 
are psychological goals to a greater extent than mathematical goals. The 
interests of this group may, therefore, be regarded as both rational investi- 
gation in the mathematical field, especially in regard to statistical theory, 
and also empirical studies utilizing the methods of science. Both inductive 
and deductive methods play important roles in the activities of most of the 
members of the group. In the present discussion I wish to confine myself 
primarily to the scientific and psychological aspects of the activities of the 
Society members rather than the mathematical. 

*This paper was presented as the Presidential Address to the Psychometric Society, 
September 5, 1952. 
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As psychologists, what are we trying to do and how are we going about 
doing it? Perhaps we should begin with some basic points on which we can 
hope to get general agreement. As our first statements let us take: The 
aim of psychology is the formulation of scientific knowledge regarding human 
behavior. All knowledge is based on perceptions made by individuals. It is 
assumed that observations give us a first approximation to knowledge. A 
primary function of science is to supply devices for increasing the precision 
of these initial observations. Following the lead of the logical positivists in 
philosophy and the operationists in physics, a number of psychologists, 
including 8. 8. Stevens, Kenneth Spence, Melvin Marx, and others, have 
worked on formalizing the bases for a modern methodology for psychology. 

It is generally agreed that the ultimate aim of psychology is the under- 
standing and explanation of behavior. The practical side of such under- 
standing and explanation is the ability to predict and to control. Our ability 
to predict or influence human behavior depends on the development of 
principles or laws. Current theorists regard no such principles or laws as 
certain but only probable. Although empirical observation may enable us 
to make predictions which are later confirmed, the generalizations on which 
these predictions are based must be supplemented by more general theoretical 
principles or explanations which have been tested and confirmed before these 
generalizations can be admitted to the body of scientific knowledge. 

These principles or explanations are derived from a propensity to make 
inferences of the kind consonant with such principles. The principles are 
usually made explicit only by reflecting on these inferences. The usual 
argument for the acceptance of psychological principles or laws is that they 
are the simplest explanations consistent with the observations which are 
available. 

To go back a step, we find that generalizations are based on concepts or 
classes. The operationists regard a concept as defined by a set of operations. 
It is clear that concept formulation is essential to scientific studies, since all 
objects and events are unique to some extent. In setting up a class of objects 
or events, a number of criteria are usually established to determine member- 
ship in the class. Empirical classes of this type are never logically precise. 
Borderline cases can always be found which will defy any reasonable set of 
criteria. The concept can be simple and quantitative such as response time 
to a specified type of signal, or it can be complex and qualitative such as fear 
reaction to a particular threat or danger. 

The psychologist must make judgments of sameness or of discrimination 
in studying a concept or a class. Another type of judgment must be made 
by the psychologist in setting up a criterion for accepting or rejecting objects 
or events as members of the class. This is a judgment of the relevance or 
importance of various associated characteristics and conditions. For example, 
brightness of the signal might be quite important in studying response time, 
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but relative humidity might be of negligible relevance. Often, of course, these 
judgments consist of hypotheses which, in other experiments, become experi- 
mental variables. 

To summarize, then, the basic ingredients of scientific study are observa- 
tion, concept formulation, generalization, and explanation. Let us turn 
then to the problem of the most effective methodology to get us from our 
original observations to explanations in terms of principles and laws. 


Methodological Considerations for Psychological Problems 


It seems useful at this point to formalize some of the basic considera- 
tions that are regarded as essential to sound methodology in psychology. 
For purposes of the present discussion, these will be discussed under seven 
main headings as follows: 


1. Defining and formulating problems 


The principal considerations in this regard were mentioned above and 
relate to the role of judgment in the development of concepts and classes. 
It should be clear from the discussion above that it is quite impossible to 
develop scientific knowledge without making judgments. It is desirable to 
utilize judgments which will provide as uniform conclusions from one re- 
searcher to another as is possible. It is believed that judgments of sameness 
fulfill this condition to a substantial degree and should be preferred. 

Judgments of sameness may be used in setting up the concepts, classes, 
and series which are essential to collecting data for use in analysis. It is 
clearly impossible to make a statistical study of a group of psychotics. On 
the other hand, it is possible to make such studies of the weight, the age, and 
the ability to answer a series of questions of the members of this group. Some 
attribute of the members of a group or a sample must be clearly specified for 
study. It gives us no scientific knowledge to merely study people without 
defining what we are studying about them. 

Another matter of importance in defining and formulating problems is 
that all verbal definitions are to some extent ambiguous because of the lack of 
precision in language. To achieve specificity in defining the aptitude or 
class to be studied and the function involved, it is desirable that these be 
stated in terms of the operations to be performed in selecting the observations 
which belong in the series or classes and in measuring or classifying each with 
respect to the attribute involved. These definitions should be such that all 
reasonable individuals perform essentially the same operations in the same 
way, and therefore obtain very similar results. 


2. Conditions and control 


This consideration owes its importance to the fact that it is impossible 
to duplicate any situation precisely. To obtain scientific knowledge it is 
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necessary that those conditions which are relevant to a significant degree to 
the outcome of the results be controlled in such a way that they will not cause 
the researcher to make incorrect inferences. In dealing with people, such 
conditions of motivation, attention, and set are usually very relevant and 
frequently particularly difficult to control. 


3. Observing and perceiving 


The fundamental consideration here is that of directed observations of 
the precise attributes to be observed and classifying the facts perceived with 
maximum accuracy. No two observers ever see exactly the same thing. 
Furthermore, their previous experiences differ in ways which affect their 
perceptions of phenomena observed. It is therefore essential that the precise 
operations to be performed in observing and in classifying what is observed 
be clearly specified if the observations are to be public and objective and are 
to be such that the results can be verified by other investigators. 


4. Recording and communicating 


This is an important consideration because memories tend to be dim, 
vague, and sometimes distorted images of perception. Language, which is 
our sole basis for communicating observations, can never provide a precise 
report. Verbal reports of perception tend to be incomplete, biased, and 
incoherent. This indicates the importance of immediate recording and also 
of immediate judgments as to the sameness of the objects, events, and attri- 
butes involved and the relevance of other factors in the situation in influencing 
the attributes being observed. 


5. Sampling 

This is a consideration very familiar to this group. The usual aim is to 
obtain the type of sampling conditions which underlie the theoretical sampling 
variations provided by standard formulas. The basic consideration is to 
insure that every member of the class being studied has equal opportunity 
to be included in the sample. Unfortunately, practical working conditions 
frequently prevent this from being the case. A great deal of the problem, 
therefore, is concerned with estimating the effects of these deviations from 
theoretical sampling conditions. One procedure which has been found very 
effective in handling sampling problems is the repeating of a study in its 
entirety to obtain an empirical estimate of sampling fluctuation. Another 
effective procedure is to bracket the results by obtaining an upper and lower 
limit based on samples which represent opposite extremes of the biases or 
determining factors present. 


6. Analyzing data 


The principal considerations in this area are those of accurate description 
and efficient comparison. It will be obvious from the preceding remarks that 
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quantitative and logical categories describing observations are of great 
importance in facilitating analysis. 


7. Interpreting results 


The chief factors in this area are the acceptance and rejection of hypo- 
theses and similar problems regarding the degree of confidence to be given 
to the specific inferences or generalizations. The problems are primarily 
logical. It is important in using theoretical values to make allowances if 
theoretical conditions are not fulfilled. It is also desirable to avoid the error 
of accepting an alternate hypothesis merely because the data rejected the 
hypothesis being tested. Another important consideration is that the nature 
of the experimental studies and probabilities be kept in line and probabilities 
not be accepted as certainties but merely as working theories. 

It is hoped that this group is in general agreement with the basic me- 
thodological considerations outlined above. Stated in general terms, they 
sound familiar and acceptable. Using these as a basis, what conclusions do 
we reach regarding our practical research problems? At this point I would 
like to narrow the discussion to one specific type of problem which I believe 
is most deserving of attention from psychologists at the present time. 
Specifically, these are problems with very important social implications. This 
does not mean that I prefer applied research to basic research. There seems 
adequate evidence at the present time that work on applied problems can 
lead to the discovery of basic scientific knowledge. The terms in which the 
problem is formulated and the general research methods used are the de- 
termining factors in whether a research study yields a specific result or one 
which may be generalized. 

In the remarks that follow I am not proposing to tell specialists in learn- 
ing, clinical, and social psychology how to carry out their research studies. 
These specialists have been trained in particular methods and habits of 
thinking and it does not seem appropriate to urge them to abandon these 
procedures in favor of what to me seems a more promising approach. I am 
recommending that members of this Society give serious consideration to 
certain methods for studying problems including those in the fields of learning, 
clinical, and social psychology. 

The members of this Society have special training and experience in 
dealing with statistics from large samples. The methods of statistical analysis 
using computing and tabulating machines make it possible for persons with 
appropriate training to detect small differences even though they are obscured 
by the presence of a large number of disturbing factors. Similarly, the 
members of this Society have worked extensively with problems of testing 
involving the collection of a large number of observations on each individual 
in the sample. Not only is the number of observations usually large, but 
these observations frequently include a wide variety of attributes of the 
individual’s performance. Because of the relevance of this type of experience 
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for effective work on the important social problems mentioned above, it is 
suggested that we leave the work with small sample theory to those psy- 
chologists making laboratory studies of rats, psychotics, and infants. 

My specific recommendation, then, is that in their research work the 
members of this Society first select a broad, practical area containing impor- 
tant unsolved psychological problems and then use the approach and pro- 
cedures described below to carry out research studies which will not only 
have immediate practical utility, but also contribute to our store of useful 
scientific knowledge. 


Guiding Principles for Specific Steps 
The specific approach and procedures recommended are discused under 
the following five headings: (1) Defining and formulating the specific problem, 
(2) Designing the study, (3) Collecting the data, (4) Analyzing the data, and 
(5) Interpreting and reporting the results. 


1. Defining and formulating the specific problems 

Having selected the broad, practical area, the following guiding principles 
are proposed for defining and formulating the specific problem. First, it is 
suggested that the problem be related to human activities which are practical, 
general, and normal, and not artificial, special, and abnormal. This suggests 
concentration on the everyday problems of human beings. It further suggests 
a selection of the more general problems encountered by normal people rather 
than working on the unusual or special case. Another important point to be 
considered in defining the problem is to establish the general aim of the people 
involved in this type of situation or event. Without a knowledge of the indi- 
vidual’s goals, what he is trying to get to or get from, or what the individual 
is trying to do, his intentions, it is extremely difficult to formulate useful 
descriptions of his behavior. It should be noted that this type of problem will 
tend to be stated in broad and general rather than narrow and specific terms 
at the outset. The problem can be expected to be stated more as a program 
for research than as a specific study. Many specific problems for investiga- 
tion will be developed as the study progresses. 


2. Designing the study 

The principles proposed in connection with preparing the design for the 
study are centered particularly around two concepts. The first of these is a 
preference for studying the problem, at least initially, in its natural or real 
setting rather than in an artificial or laboratory situation. This procedure 
has obvious advantages in the relevance and the validity of the results 
obtained. It also makes it especially easy to make use of persons in collecting 
the data who have had substantial experience in the situation being studied. 
Some of the difficulties of such a procedure are the lack of standardization in 
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various examples of the situation being investigated. Only recently has 
substantial progress been made on the problem of standardizing or adjusting 
for differences in conditions, and on producing uniformity in the perception 
of the goals or the general sets of the individuals with respect to the situation. 

The second principle proposed in designing the study is the systematic 
collection of a large representative sample of events or behaviors of the type 
being studied. The laboratory tradition in psychology has led to the design 
of studies including only a small number of observations because of the 
difficulties of processing large numbers of cases in the laboratory. If useful 
results are to be obtained from small samples, great care must be used in 
reducing experimental error by controlling as many as possible of the factors 
which might obscure the effects of the variable being studied. It seems 
especially important that a very large number of observations in the natural 
setting be obtained to provide the basis for the tentative generalizations and 
hypotheses which are needed as a basis for establishing causal principles. 
Too often, specific studies have been set up to investigate problems formu- 
lated solely on the basis of limited self-observation or hunches. 


3. Collecting the data 

There are two important phases to collecting data. The first of these 
involves the process of observing and classifying the relevant aspects of the 
situation. It is proposed that only very simple judgments be required of 
the observers. Insofar as the observer can direct his attention to such opera- 
tions as counting, reading along a scale, making judgments of ‘‘same’’ or 
“different,” or judgments of “greater than” or “less than,” it can be expected 
that little distortion will be introduced into the results by the observational 
process itself. In many practical situations it is essential that simple infer- 
ences be made by the observer. Under these circumstances, it is important 
that the observers be capable of making the specific types of inferences and 
judgments required. It appears especially undesirable to collect opinions, 
inferences about causal reasons, and complex judgments regarding the 
appropriateness and the quality of behavior in situations where simpler types 
of data are available. It is much easier to obtain agreement of independent 
observers as to whether or not a specific act was performed than as to whether 
this particular act indicated good adjustment to the situation. The over-all 
clinical type of estimate or judgment is the least satisfactory type of basic data 
for research studies of the sort proposed here. 

The second phase of data collection is recording and reporting. It is 
futile to make precise observations if these are to be seriously distorted in the 
form in which they are reported. Immediate, on-the-spot recording by the 
observer is the most satisfactory procedure for collecting data. In situations 
where memory images must be used it is desirable that steps be taken to 
check the observations at the time they are made, and to reinforce them by 
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recall and rehearsal shortly afterwards. The observer is also important in 
this phase. Insofar as perceptions and inferences may be distorted by previous 
experience in the specific type of situation, the observer should be selected 
and trained to minimize the introduction of bias from this source. 


4. Analyzing the data 


The fundamental principle governing the analysis of data is that the sole 
purpose of this step is the more efficient description of these data. No really 
new information is added during this process. All of the information must be 
inherent in the data as collected. Summaries and descriptive statistics 
frequently make it easier to see relationships and differences in the data, and 
they of course provide the basis for tests of significance. However, as ab- 
stractions, they always contain less information than the originally collected 
data. Once the data have been collected, nothing can be done with them to 
improve the comprehensiveness, specificity of detail, or validity of the in- 
formation they provide. 

In many laboratory studies the data are so consistent and the experi- 
mental error so small that little in the way of statistical analysis is required 
to reveal the essential relationships. In the type of study proposed here this 
is not the case. Unusual effort and skill are required to discover and describe 
the fundamental relationships which are concealed within the large mass of 
the initial data. All of the tricks of multivariate analysis including factor 
analysis, partial correlation, and discriminant analysis will have to be used 
for effective handling of these problems. 


5. Interpreting and reporting the results 


Although most of the problems involved in interpreting and reporting 
results are common to all types of research, there are certain aspects which 
seem to deserve special emphasis here. The first responsibility of the in- 
vestigator is to describe precisely the problem selected for investigation. 
Similarly, if the observations were not made on a random sample of human 
beings, the nature of the group should be reported as specifically as possible. 
Since it is usually hoped that the results can be generalized to other groups, 
any limitations imposed by the nature of the specific group used should be 
brought into clear focus. In similar fashion, the procedures in each of the 
other steps listed above should be clearly described so that the decisions of 
the investigator in carrying out the various steps of the study can be reviewed 
and evaluated by those interested in the results. It is important that limita- 
tions of a study be clearly reported. It is also essential that the research 
worker make available to others his considered judgment regarding the 
degree of credibility which should be attached to these findings. This is 
frequently a difficult type of judgment to make, but it is usually one the 
investigator is better prepared to make than his colleagues who are interested 
in utilizing his findings. 
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Examples of Applications 


By way of conclusion, some of the implications of the above remarks for 
rather specific problems will be mentioned. Probably the best known appli- 
cation of the approach and the procedures described above is in connection 
with job analysis. In this area much progress has been made in substituting 
systematically collected factual data for the opinions, hunches, and general 
impressions of various types of observers. The critical incident technique 
and related procedures for collecting reports of observations made in accord- 
ance with detailed instructions and criteria have produced uniformly ex- 
cellent results in the hands of personnel trained in their use. Detailed 
statements of the requirements for industrial plant employees, office workers, 
dentists, infantrymen, combat leaders, aircrew members, research workers, 
and many other groups are now available. 

The detailed statements of job requirements obtained in the type of study 
described in the preceding paragraph have led directly to the development 
of new procedures for constructing selection tests, proficiency measures, and 
criteria of job performance. The importance of objective procedures, simple 
judgments, frequent recording, and precise criteria, have been demonstrated 
for a wide variety of types of measures in these areas. 

Not so well known are recent studies of the learning process utilizing this 
general approach. Following the procedures described above, studies of 
learning have centered around two of the most common learning situations. 
The first of these is the teacher-student situation. The activities of the 
teacher in the typical school situation which promote learning or interfere 
with it have been studied by a number of investigators following the general 
methods outlined here. A similar study has been made of pilot instructors. 
In this situation all of the flight instruction is given individually. Since 
motivation is usually high and the opportunities for acquiring skill other 
than in the instructor-student situation are small, this provides an excellent 
opportunity to study the instructional process. The preliminary results of 
these studies suggest that this approach has substantial promise as an aid to 
gaining a better understanding of the problems of learning. 

The second type of study on the learning problem using this general 
approach consists of a systematic analysis of the process of getting informa- 
tion from written materials. Although this study is in an even more prelimi- 
nary stage, it seems likely to make a definite contribution to the practical 
problems of human learning. 

A few studies have also been made now applying this methodology to 
problems of clinical psychology. Perhaps the best example to date is the 
development of a tentative definition in behavioral terms of immaturity 
reaction. This study was recently completed by Dr. Leo Eilbert. The 
acceptability of his detailed behavioral definition to a panel of fifteen psy- 
chiatrists to whom it was submitted for review is interpreted as an indication 
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of great promise for the practical usefulness of the results which can be 
expected from applying these procedures on a larger scale to other groups 
showing unusual behavior. Work has also been done on studies of the thera- 
peutic process. It is hoped that this general type of procedure will make it 
possible to replace much of the subjective opinion and impression in this 
field with data of a more objective and factual nature. 

The last application proposed is to the problems of social psychology. 
This field also seems to be replete with studies of opinions, ratings, impressions, 
and theories stated in such general terms that they cannot be tested. It is 
believed that many of the studies of reported attitudes, expressed preferences, 
estimated motivations, and role judgments can be replaced by reports of 
specific observed behavior with great profit to the advancement of knowledge 
in this field. 

Before closing, let me reemphasize the fact that most if not all of the 
principles discussed here are both well known and regularly used by many 
of the members of this group. This presentation has attempted to formalize 
and underline some of the more important aspects of this approach and to 
try to encourage you and your students to exploit more fully the training and 
experience which provide a sound basis for making substantial progress in 
solving urgent practical problems and at the same time increasing our store 
of scientific knowledge regarding human behavior. 
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A STATISTICAL DESCRIPTION OF VERBAL LEARNING* 


GeorGE A. MILLER AND WILLIAM J. McGitu 


MASSACHUSETTS INSTITUTE OF TECHNOLOGY 


Free-recall verbal learning is analyzed in terms of a probability model. 
The general theory assumes that the probability of recalling a word on any 
trial is completely determined by the number of times the word has been 
recalled on previous trials. Three particular cases of this general theory are 
examined. In these three cases, specific restrictions are placed upon the 
relation between probability of recall and number of previous recalls. The 
application of these special cases to typical experimental data is illustrated. 
An interpretation of the model in terms of set theory is suggested but is not 
essential to the argument. 


The verbal learning considered in this paper is the kind observed in the 
following experiment: A list of words is presented to the learner. At the 
end of the presentation he writes down all the words he can remember. This 
procedure is repeated through a series of n trials. At the present time we are 
not prepared to extend the statistical theory to a wider range of experimental 
procedures. 


The General Model 


We shall assume that the degree to which any word in the test material 
has been learned is completely specified by the number of times the word has 
been recalled on preceding trials. In other words, the probability that a 
word will be recalled on trial n + 1 is a function of k, the number of times 
it has been recalled previously. (Symbols and their meanings are listed in 
Appendix C at the end of the paper.) 

Let the probability of recall after k previous recalls be symbolized by 
7, . Then the corresponding probability of failing to recall the word is 
1 — 7,. When a word has been recalled exactly k times on the preceding 
trials, we shall say that the word is in state A, . Thus before the first trial 
all the words are in state Ay ; that is to say, they have been recalled zero 
times on previous trials. Ideally, on the first trial a proportion 7) of these 
words is recalled and so passes from state Ay to state A, . The proportion 
1— 7 is not recalled and so remains in state Ay . On the second trial the 

*This research was facilitated by the authors’ membership in the Inter-University 
Summer seminar of the Social Science Research Council, entitled Mathematical Models 
for Behavior Theory, held at Tufts College, June 28-August 24, 1951. The authors are 


especially grateful to Dr. F. Mosteller for advice and criticism that proved helpful on 
many different occasions. 
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words that remained in A, undergo the same transformation as before. Of 
those in A, , however, the proportion 1 — 7, is not recalled and so remains 
in A,. 

One general problem is to determine the proportion of words expected 
in state A, on trial n. Let p(A,,n) represent the probability that a word is 
in state A, on trial n. Since these are probabilities, they must sum to unity 
on any given trial: 


> PWAx ’ n) = 1. 
k 


The number of trials and the total number of times a word has been recalled 
must assume non-negative, integral values. We assume that a word can be 
recalled only once per trial at most, so the number of recalls cannot exceed 
the number of trials. Therefore, we have 


p(A,,n) = 0 for rx On < On < 8. 
We also assume that none of the words can have been recalled before the 
first trial, so for n = 0, 


w(A, 0) = 1 for k= 0, 


0 for 4 Oe 


For all trials we have the difference equation: 
p(A, , nm + 1) = p(Ai, n)(1 — 7m) + p(An-i , n) 7-1 - (1) 


This equation reflects the fact that a word can get into state A, on trial 
n + 1 in only two ways: (a) either it is in A, on trial n and is not recalled 
on trial n + 1, or (b) it is in A,_, on trial n and is recalled on trial n + 1. 

The following rationalization for this scheme is in the spirit of the statisti- 
cal theories of learning developed by Bush and Mosteller (1) and by Estes 
(3). The rationalization is not necessary for the development of the math- 
ematics, but it gives an alternative way of thinking about the present model 
and helps to clarify its relation to the earlier theories. On the first pre- 
sentation of the list of words a random sample of stimulus elements is con- 
ditioned to the appropriate response for each word. The measure of this 
set of conditioned elements is 7). (The total measure of the set of all stimulus 
elements for a given word is assumed to be unity, so the measure can be 
regarded as a probability.) If a word is not recalled, the measure of con- 
ditioned elements for that word is unchanged. But if a word is recalled, the 
proportion of conditioned elements is increased. The effect of recalling a 
word is to take another random sample of elements from the total set and to 
condition them. The proportion of elements conditioned when a word in 
state A, is recalled is 7,,, — 7, . More precise interpretation of this set- 
theoretical argument will be presented when we consider the special cases of 
the general theory. 
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The general solution of (1) when all the 7, are different is (see Appendix A): 
P(Ao 5) n) = (1 a To)" for k= 0, 


k n 
P(Ax ,) = ToT °° * Tea > ™ on Pp fork > 0. (2) 


si I] (7; ar 7) 


7=0 
ind 
The denominator of each of the fractions in the summation includes all 
differences of the form (7; — 7,;) except for the zero difference (7; — 7;). 
The expected number of times a word is recalled, all told, up to and 
including trial n, is, by definition, 


Bk, n) = kplAr 51). (3) 


The expected proportion of words recalled on trial n + 1 is the difference, 

E(k, n + 1) — E(k, n), between the cumulative values on successive trials. 

This difference is the theoretical recall score and we symbolize it by pas: . 
Thus we have the general relation 

po = 0, forn = 0, 

Pasi = E(k, n + 1) — E(k, n), forn +1> 0. (4) 

An alternative expression for p,,,; can be obtained as follows. On trial 

n the probability that a word is in state A, is p(A; , n). The probability of 

recall in state A,is7,. The product 7, - p(A; , 7) is, therefore, the probability 

that a word will both be in A, on trial n and also be recalled on trial » + 1. 

If these joint probabilities are summed over all the states A, from k = 0 


to k = n, we have the total probability that a word will be recalled on trial 
n-+1. That is to say, we have p,;, : 


Pasi = a T:p(Ax , 2). (5) 
=0 


The two expressions (4) and (5) are equivalent, which can be shown as 
follows. From (3) and (4) together we have 


n+1 n 


oun = De kp( Arn + 1) — de kp(Ar , 7). 


The first summation on the right can be rewritten by substituting for p(A; , 
n + 1) according to (1): 


n+1 n+1 n+1 


2d kp(A, ,n + 1) D kp(A;y , n)(l — 7) + Do kp(Ar-s 2) Te-1 
= =0 = 


pa kp(A; ,n) — p> kp(Ax , 2) Te 


+ p3 (k + 1)p(Az , n) 7% « 
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When this result is substituted into the expression for p,,,; , we have 


1 = — pe kp(A, , nr + om (k + 1)p(A. , nn 


n 


TiP(Ax , 0), 


k=0 


which is the desired result. 

The asymptotic behavior of the model as n increases without limit can 
be deduced from the general solution (2). First consider the case in which 
one or more of the transitional probabilities 7, is zero. All the words start 
in state A, and have a positive probability of moving along to states A, , 
A, , etc., up to the first state, A, , with zero transitional probability, 7, = 0. 
There the words are trapped; eventually all the words are recalled exactly 
h times and cannot be recalled again. This fact can be seen from (2): If 
7; > 0, then all the terms (1 — 7,)" in (2) goto zeroasn >. Thus p(A,;, n) 
goes to zero for k < h. For k > h, the product in front of the summation 
must include 7, = 0, and so p(A,,n) = Ofork > h. Whenk = h, however, 
(1 — 7,)" = (1 — 0)” = 1, and so this term in the summation of (2) does not 
go to zero. Instead, when 7, = 0 and 7, > 0 fori < h, 

er 


li A = =, 
-_ p(A, , n) bre al alt, a 


The recall score, p,; , then approaches zero as an asymptote; from (5), 


re) 


lim prsi = D> [lim p(A, , n)] = 0, 


n-7+@ k=0 n> 


since the probability at the asymptote is concentrated at state A, , and for 
this state r, = 0. This case is of little interest for an acquisition theory, 
since the asymptote of the learning curve is at zero. Therefore, in what 
follows, we shall be concerned only with the case in which all the 7, are 
different and greater than zero. 

If all the transitional probabilities 7, are greater than zero, then from 
(2) we see that as approaches infinity all the terms in the summation go 
toward zero for all finite values of k. Consequently the sum of the p(A, , n) 
can be made as near zero as we please for any finite k by selecting a large 
enough value o. ». In the limit, therefore, the probability of any finite 
number of recalls is zero. Since the sum of the p(A, , n) must equal unity, 
almost all the probability comes to be concentrated in state A. and we have 
for the limit when all 7, > 0, 


p(A., ©) = 1. 
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We are now able to show that a word in state A, has probability one of 
moving to state A,., , if the learning process is continued indefinitely. This 
happens because almost all words eventually reach state A... Thus we can 
write, for the probability of leaving state A, on some trial, 


) 


D 1p(A; , n) = 1, 
n=k 
or, 


p( A, ) n) = i for Tk > QO. 
k 


Tk 


n=k 


In all the cases we shall consider in this paper the value of 7, will approach 
an asymptote ask >. We are interested in placing the following restric- 
tions on the 7;: 


TF Tj, 
tT > 0} 


hm’ 7, =m < 1. 
koa 
The first two conditions insure that p(A, , n) goes toward zero for finite k 
and large n. The third condition provides the asymptotic value of 7, for 
infinite k. In the summation for the limiting value of p,,, , all terms are zero 
out to infinity, and so we have 
lim pari = mp(A., ©) = m. (5’) 
no 
In other words, if we assume that m is the asymptotic value of 7, as k —@, 
then m is also the asymptotic value of p,,, asn ©. 
In the special cases discussed below, a restriction is placed upon the 
value of 7, in the form of the linear difference equation,* 


Tet =Atarn, (6) 


where 0 < a < landO <a<1-—a. The limits for a have been chosen so 
that 7,.; is bounded between zero and one and, since we are interested in 
acquisition, so that 7,4; > 7, . 

Consider the following development of (5): 


n+1 


Pars = ps 7p(A,,n + 1). 


*We have tried to observe the convention that parameters are represented by Greek 
letters and statistical estimates are represented by Roman letters. In the case of a and 
m, however, we have violated this convention in order to make our symbols coincide with 
those used by other workers. The symbols m, a, a, and p were originally proposed by 
Bush and Mosteller. 
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Now substitute for p(A, , n + 1) according to (1): 
n+l n+1 


Pn+2 = > is Tip( At ’ n)(1 — 1) + 2. Tip(Ax-1 ’ N)Tr-1 


k=0 k=0 


n 


ia. > & Tip( Ax ’ n) + Ss Troi ThP( Ax ’ n). 
= k=0 
Next we substitute for 7,,, according to (6): 


n 


Pn+2 = Pn+1 — > Tip( At ’ n) + > (a + at,)Txp( A, ’ n) 


k=0 


II 


oo ee 3 rp(As , n) 


= (1+ api. — (1 —o)E(ri,n+ 1), (7) 


where E(r; , n + 1) is the second raw moment of the 7, (as pn+; is the first 
raw moment) for trial n + 1. 

Restriction (6) brings the system into direct correspondence with a 
special case of the theory developed by Bush and Mosteller. In their termi- 
nology, an operator Q, is applied to the probability of response, p, to give 
a, + ap as the new probability whenever a trial is successful. A second 
operator Q, is applied to give a, + a,p whenever a trial is unsuccessful. In 
the present application of this more general theory, Q, is preserved intact by 
restriction (6), but Q, is assumed to be the identity operator. That is to 
say, a, is zero and a, is unity, so Q,p = p. In the present application, an 
unsuccessful trial consists of the omission of the word during recall. It 
seems reasonable to assume that the non-occurrence of a word has no effect 
upon its probability of occurrence on the next trial. How successful this 
simple assumption is will be seen when we examine the data. 


Analysis of the Data 


At the end of the experiment the experimenter has collected a set of 
word lists—the words recalled by the learner on successive trials. These 
recall lists will usually contain a small number of words that did not occur 
in the presentation. These spontaneous additions by the learner are of some 
interest in themselves, but we shall ignore them in the present discussion. 

We would like to use the data contained in the word lists to obtain an 
estimate of p,.; in (5). We shall refer to the estimate as r,,, . There are, 
we suppose, V words provided by the experimenter as learning material in 
the experiment. It seems reasonable to assume that under certain con- 
ditions these words are homogeneous. By this we imply that the responses 
to all of the words in state A, may be considered as estimates of the same 
transitional probability of recall, 7, . 
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d 


We can then define a convenient statistic, 


1 n N 


LS N Zz. de X seer (8) 
é k=0 t=1 
The numbers, X;,,,,+1 , are either zero or one. The subscripts k and n + 1 
have the same meaning that we have attached to them previously. They 
indicate that we are looking at an event that occurs on trial n + 1 to a word 
in state A, . The first summation is carried out over 7, the experimental 
words, with k fixed to show that we count the number of words in each state. 
The rules that determine whether an X;..,:; is zero or one are straight- 
forward. The X;,x,.+: are zero for all words not in state A, when summing 
oni. They are zero for any word in state A,, if a recall fails to occur on trial 
n+ 1. Lastly the X,,..,,, are 1 for any word in state A, , provided that a 
recall occurs on trial n + 1. The second summation extends over k, the 
various states. This summation goes only up to n because our reference 
point for determining the number of states is trial n. These rules determine 
Tn+1 a8 the proportion of correct responses to the N experimental words on 
trial n + 1. 
To show that r,. is unbiased we observe that 
n N 
Brae) = 3X w( XS Xeann) | 
< k=0 t=1 

The expectation of any X;,.,,+; in state A, is r,. Thus the expectation of 
the sum in the brackets is N-7,-p(A, , n). Substituting this into the ex- 
pression for E(r,4,), we find 


n 


E(rass) = > 7.p( A, ’ n), 


E(ras1) = Pn e (9) 


The sampling variance of 7,,,, around p,,, is determined by the variances 
of the various X,,,,,;, around the transitional probabilities, r, . 


n N 
Var (r4:) = 3 + i Var (  * Piscwoth 


k=0 t=] 
The variance of any X;,,x,.+: in state A, is binomial and is given by 7,(1 — 7,). 
The variance of b X ;.,n+1 thus becomes N p(A,, )r, (1 — 7). Substitut- 
ing this into the expression for Var (r,,,), we obtain 
, le 
Var (ass) = 5p 2 (Aa MTL — 14). (10) 
é k=0 


It should be noted that this variance is never larger than the binomial variance 


1 
N Pu+1 * ie! oT Sane) s 
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since the binomial variance includes in addition to (10) a term that depends 
on the variance of the 7, around p,.; , 


n 
Var GG...) = Pnsi(l = Pn) —_ = me rip(A, ,n) — io (10’) 
A N k=0 : 

In order to apply the general theory we must obtain estimates of the 
transitional probabilities, 7, . Now 7, is the probability of moving from 
state A, to A,,, and is assumed to be constant from trial to trial. After 
trial n a certain number of words, N,.,,, are in state A,. Of these N,.,, words, 
some go on to A,,, and some remain in A, on trial n + 1. The fraction that 
moves up to A,,, provides an estimate of 7, on that trial. Therefore, on 
every trial we obtain an estimate of 7, . Call these estimates ¢,.,,,; . Then 


> ie orere 


ba = = 
ie Mas 


If \V,., is zero, no estimate is possible. 


n 


Next we wish to combine the ¢,,,,, to obtain a single estimate, é, , of 


the transitional probability, 7, . The least-squares solution, obtained by 
minimizing (t,,.:1 — 7) , is the direct average of the ¢,,,,, . This estimate 


is unbiased, but it has too large a variance because it places undue emphasis 
upon the ¢,.,4; that are based on small values of N,,,. We prefer, therefore, 


to use the maximum-likelihood estimate, 


ae Ae ee 


+“, (11) 
7 Ny, 

which respects the accuracy of the various ¢;,,41 . 

For example, after trial 7 there may be 10 words in state A; . Of these 
10, 6 are recalled on trial 8. This gives the estimate ¢;., = 6/10. Every 
trial on which N,,,, ~ 0 provides a similar estimate, t;,,,,.. The final estimate 
of 7; is obtained by weighting each of these separate estimates according to 
the size of the sample on which it is based and then averaging. This pro- 
cedure is repeated for all the 7, individually as far as the data permit. 

The t,,.4: are also useful to check the basic assumption that 7, is in- 
dependent of n. If the ¢,,,,, show a significant trend, this basic assumption 


is violated. 


The Simplest Case: One Parameter 


The computation of p(A, , n) from (2) for the general case is exceedingly 
tedious as n and k become moderately large. We look, therefore, for a simple 
relation among the 7, of the form of restriction (6). The first case that we 
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shall consider is 
1 a, 
Tir1 = @ + (1 aaa a) rT; . (12) 


In this form the model contains only the single parameter, a. The solution 
of the difference equation (12) is 


m=1-—(1-—a)". (13) 


The interpretation of (13) in set-theoretical terms runs as follows: On 
the first presentation of the list a random sample of elements is conditioned 
for each word. The measure of this sample is a, and it represents the prob- 
ability, 7, , of going from state A, to state A, . Ifa word is not recalled, no 
change is produced in the proportion of conditioned elements. When a 
word is recalled, however, the effect is to condition another random sample 
of elements, drawn independently of the first sample, of measure a to that 
word. Since some of the elements sampled at recall will have been previously 
conditioned, after one recall we have (because of our assumption of inde- 
pendence between successive samples): 


opeemme RE — SS ( Common ) 


" during presentation during the recall elements 


=a+ta-—a =1-—(l-— a)’. 
This quantity gives us the transitional probability 7, of going from A, to 
A, , from the first to the second recall. The second time a word is recalled 
another independent random sample of measure a is drawn and conditioned, 
so we have 
rt. = [1 — (1 — a’) +a — a[l — (1 — @)*] = 1 — (1 — oo)’. 
Continuing in this way generates the relation (13). 
With this substitution the general difference equation (1) becomes 


p(A,,n + 1) = p(A;,, n(1 — a)**? + p(Ay-1, »)[1 — (1 — a)*). 


The solution of this difference equation can be obtained by the general method 
outlined in Appendix A or by the appropriate substitution for 7, in (2). 
The solution is 


pl Ao ’ n) = (1 _ a)", 


p(A, ,n) = (1 — a)" Tl . (1 — a)”*‘). (14) 


1=0 


From definition (5) it is possible to obtain the following recursive ex- 
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pression for the recall on trial n + 1 (see Appendix B): 
Pari = a + (1 — a)[1 — (1 — @)")p, . (15) 


The variance of the recall score, 7,,.; , is 


1 
Var (Ta+1) = aN (Pn+2 = wri (16) 


In order to illustrate the application of these equations, we have taken 
the data from one subject in an experiment by J. 8. Bruner and C. Zimmerman 
(unpublished). In their experiment a list of 64 monosyllabic English words 
was read aloud to the subject. At the end of each reading the subject wrote 
all of the words he could remember. The order of the words was scrambled 
before each reading. A total of 32 presentations of the list was given. 

From the detailed analysis of the estimates of 7, derived from this 
subject’s data it was determined that a value of a = 0.22 would provide a 
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TRIAL NUMBER 
FIGURE 1 


Comparison of Theoretical and Observed Values of p, for the One-Parameter Case. Dotted 
line is drawn + one standard deviation from pp . 


good fit. In Figure 1 the values of p,,, computed from (15) are given by the 
solid function. The data are shown by the open circles. The dotted lines 
are drawn + one standard deviation from p,,, as computed from the variance 
in (16). The single parameter gives a reasonably adequate description of 
these data, at least through the first 20 trials. From the 20th trial on, how- 
ever, it seems that the subject “forgets as fast as he learns.” He seems to 
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reach an asymptote somewhat below the theoretical value at unity. The 
introduction of an asymptote less than unity will be discussed in connection 


with the three-parameter case. 
2 k=3 
° 
° 
pee Lee pe Ppp ee 





o> 
























1004 Ee 
80 
x = 
< ° 
60 ° 
C 0° ° 
z ad o k=4 
> 40 ‘4 
Fr eee eee eee es 
oO 
@& 
a 20 
a=0.22 
0! 


TRIAL NUMBER 


Figure 2 
Comparison of Theoretical and Observed Values of p(A; , n) for the One-Parameter Case 


As a further check on the correspondence of theory and data, Figure 2 
shows the predicted and observed values of p(A; , n) as a function of n, for 
k = 0, 1, 2, 3. 


Second Case: Two Parameters. 


In the one-parameter form of the theory it is assumed that the propor- 
tion of elements sampled during the presentation of the list is the same as 
the proportion sampled during each recall. Most data are not adequately 
described by such a simple model. At the very least, then, it is necessary to 
consider the situation when these two sampling constants are different. In 
order to introduce the second parameter, we phrase restriction (6) in the 
following form: 


To = Po, 


Tr+t = A+ (1 a a) 7; ’ (17) 
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where pp is the proportion of elements conditioned during the presentation. 
The solution of this difference equation can be written 


71, =1—(1—p)(1 — a)". (18) 


On the first presentation of the list a random sample of measure 7p is 
conditioned to every word. When a word is recalled, a random sample of 
measure a is drawn and conditioned. After one recall, therefore, the measure 
of conditioned elements is 


11 = Po + a — app = 1 — (1 — po)(1 — a). 
After two recalls the measure of conditioned elements is 
t2 = [1 — (1 — p,)(1 — @)] + a — a{l — (1 — p, (1 — a] 
=1-(I —p,)(1 —@’. 


Continuing in this way generates the relation (18). 
With this substitution the general difference equation (1) becomes 


p(A,,n+ 1) = p(A,,n(1 — p)(1 — a)" 


+ p(A,., [1 — (1 — p,)(1 — a)*"]. (19) 
The solution of (19) is 


p(Ao , n) = (1 — pro)”, 





_ fot [1 — (1 — p)(1 — a)*][1 — (1 — a)*"‘] 
i 1 — (1 — a)" 


When py = a, (20) reduces to (14). 


The recursive form for the recall now becomes (see Appendix B) 
Pn+t = Po + (1 +a Po) {1 ~ (1 as a)" | pn - (21) 


The variance of 7,,, is 


p(A, ,n) = (1 — py)” (20) 


; ] 
Var (rai) = Tay (Pat2 — Pnsi) (22) 


In order to illustrate the application of these equations we have selected 
two sets of data. The first set was collected by Bruner and Zimmerman. A 
list of 32 monosyllabic words was read aloud. At the end of each reading the 
subject wrote all of the words he could remember. The order of the words 
was scrambled before every reading. A total of 32 presentations of the list 
was given. 

From the analysis of the #, calculated for this particular subject it was 
found that a = 0.10 and p, = 0.27 gave a good description of the data. In 
Figure 3 the values of p,., computed from (21) are shown by the solid func- 
tion. The data are given by the open circles. The dotted lines are drawn 
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+ one standard deviation from p,,, as computed from (22). As a further 
check, Figure 4 shows the predicted and observed values of p(A, , ) as a 
function of n for k = 0, 1, 2, 3. 

i, The distribution of cumulative recalls on any given trial provides still 
another way of viewing the data. In Figure 5, the cumulative distribution 
of k, the number of recalls, is shown for trials 5, 10, 15, 20. The proportion 
of test words recalled & times or less is plotted for comparison on each trial. 

The second set of data was collected by M. Levine. He read aloud a 
100-word anecdote. At the end of the reading, the subject wrote down all 
he could remember. Four such trials were given. The order of the words 
was not scrambled during the interval between trials. 

From the analysis of the data for this particular subject it was found 
that a = 0.87 and py) = 0.61 gave a good description of the results. Figure 6 
shows the comparison of theory and experiment both for p,,, and for p(A, , ”) 
for k = 0, 1, 2. 

As a general observation, we have noted that when the order of the words 
is not scrambled between trials, the parameter a is relatively large. This 
is to say, when the words are not scrambled, there is a much higher probability 
that the same words will be recalled on successive trials. This effect is related 
to the serial-position curve. The subject recalls words at the beginning and 
at the end of the list. If these words remain in their favored positions, they 
continue to be recalled. New words are added to those recalled at the ends 
at a rate determined by py , so the learning works from the two ends toward 
the middle, which is the last to be learned. This effect has been noted with 
lists of randomly selected English words as well as with anecdotes. 


Third Case: Three Parameters 

In the one- and two-parameter cases we have assumed that after sufficient 
practice the subject should eventually reach perfect performance. Some data, 
however, seem to evade this simple assumption and so it is récessary to con- P 
sider what happens when a lower asymptote is introduced. Such a parameter 
may be necessary when, for example, the period of time allowed for recall is 
limited. 

To introduce the third parameter we adopt the general restriction (6) 





To = Po; ° 
Tr41 =~ Atay, where Oa = fai. (23) 
The solution of (23) can be written 
a a k 
; = ia ier 0 . 24 
' La (; —a )a = 


When a = 1 —a, (24) reduces to (18). From (24) we see that as k increases 
without limit, 7, approaches a/(1 —a) as an asymptote. From (5’) we know 
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Comparison of Theoretical and Observed Values of p, for a Two-Parameter Case. Dotted 
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Figure 4 


Comparison of Theoretical] and Observed Values of p(Az , n) for a Two-Parameter Case 
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that 7, and p,,, approach the same asymptotic value, m. So we have the 
equation 





, a 

ian - = T° (25) 
Since 1 — a > a, m cannot exceed unity; and since both a > Oandl —a 
> 0, m cannot be negative. In general, we are interested in cases where 
m > po, for if po > m, we obtain forgetting rather than acquisition. 
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A set-theoretical rationalization for (24) runs as follows. On the pre- 
sentation of the material a random sample of elements of measure pp is 


conditioned for every word. At the first recall a sample of measure 1 — a@ 
is drawn. Of these elements, a portion of measure a is conditioned and the 
remainder, 1 — a — a, are extinguished. We add the conditioned elements 


as before, but now we must subtract the measure of the elements conditioned 
during presentation and extinguished during recall, i.e., (1 — a — a) po. Thus 
we have 
7, = Po +a — ap — (1 — a — a)po 
= m — (m — po)a. 
At the second recall the same sampling procedure is repeated: 


7 ta-—az7, —(1—a-—a)r, 


T2 
=atar, = m—(m— pa. 
Continuing in this way generates the relation (24). 

When (24) is substituted into (1), we obtain the appropriate difference 
equation, but its solution for the three-parameter case is hardly less cumber- 
some than (2). It would appear that the simplest way to work with these 
equations is to take advantage of our solution of the two-parameter case. 

First, we introduce a new transitional probability, 7; , such that 


% = 1) m 


= 1—(1 — p/m)a’, for po < m. (26) 


This new variable is now the same as in the case of two parameters given in 
(18), with substitution of p,/m for p, and a for (1 — a). Therefore, from 
(2) and (20), we know that 

p 


k 
we 


ae 
Te , eats. 24, 
Woty * °° Tera _ 
(7% — 7’) 
177 
p\"* Fy ll — A — po/ma'|fl — a" "| ae 
= IT a 
m, i=0 aa 
= p'(A, , n). 
When m 7; is substituted into (2), the factor m* in the product in front 
of the summation cancels the factor m* in the denominator under the summa- 
tion. Thus we know that 


I 


WA, n) = thre fh. Dd ae ii, (28) 
~~ TCG = 


7* 


n 
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which is the same as p’(A, , ) in (27) except for the numerator under the 
summation. This numerator can be written 


(1 — 7,)” = (1 — m) + mil — 7)" 


= (1 — m)"+n(1 — m)"'m(1 — 7%) 
+ (na — m)*?*m'(1 — 71)? + +> m1 — 7)". (29) 


Now we substitute this sequence for the numerator in (28) and sum term by 
term. When we consider the last term of this sequence we have 


k mee n\n 
rs a» wo = 
ToT1 ° °° TK-1 ’ 


is : , , 
I _ Ti) 


7=0 

11 
which we know from (27) is equal to m"p’ (A, , n). The next to last term 
gives 


k 
k 


n(1 — m)"""(1 — 7/)"" 
teri e+ wha DL : 

1=0 

IT (i - 7) 


7=0 
ii 





b] 


; 
which we know from (27) is equal to n(1 — m)m""' p’{4,,n — 1). Proceed- 
ing in this manner brings us eventually to the case where n < k, and then 


we know the term is zero. Consequently, we can write; 
aes 


p(A, , n) m"p'(A,,n) + n(1 — m)m"'p'(A,,n — 1) + °°: 


nu = . 
— n—k k ICA ; k 
ote ( ea Ja m)" “mp ; ) 


n 


Zz i" m'(1 — m)""'p'(A,, 2). (30) 


1=k 


When the asymptote is unity (m = 1), (29) and (30) reduce to the two- 
parameter case. 

We recall that because of the way in which our probabilities were de- 
fined in (1), (80) can be written as 


n 


oA, ,n) = pm (") mic — m)" 'p’(A, , i). 


1=0 
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Now it is not difficult to find an expression for p,,, in terms of the p{ computed 
in the two-parameter case: 


n 


Pari = > 7. p( A, ’ n) 


k=0 


m =; Tip(A, , n) 


k=0 


me > 2 (") mca — eA, , 


k=0 i=0 


If we invert the order of summation, we find that 


n 


poor = mS (")mid — my! Oo ripe 


t=0 


m >, (") ma — my ‘pies (31) 
1=0 

The computation of p,,,; by this method involves two steps: first, the values 
of pr, are calculated as in the two-parameter case with the substitution 
indicated in (26); second, these values of p4,, are weighted by the binomial 
expansion of [m + (1 —m)]" and then summed according to (31). 

These computations can be abbreviated somewhat by using an approxi- 
mation developed by Bush and Mosteller (personal communication). It is 


Pasa = (2 + a + 2aa)pi+1 — [a'(1 — a) + (1 + a)(1 + 2aa)]p, 
+ 3(1 — a”)(1 — a)p, — 2(1 — @)(1 — @”)p, — 3(1 — 0°) prpasr 
(n>1). (82) 


The approximation involves permitting the third moment of the distribution 
of the 7, around p, to go to zero on every trial. 
The variance of r,,,, in the three-parameter case is 


Var (rs1) = a ncce > 16 - Daaadl- (33) 


This expression for the variance of 7,,, follows directly from (7) and (10’). 
It is easily seen that (10’) can be written as follows: 


D> Ti D(Ax » ®) = pnsit — N Var (a4). (34) 
=0 


Substituting (34) in (7) and solving for Var (r,.,) we find that 


1 


een [Pnz2 — (@ + @)Pnsil, 


Var (Ta+1) = 


which, except for notation, is (33). The one-parameter and two-parameter 
variances (16) and (22) are special cases of this expression. 
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It is of interest ‘to observe that when the limiting value, m, is substituted 
in (33) for Pnsofand Pa+1 , the limiting variance is found to be binomial. That 
is, 

lim Var (7,41) = mo Ste 
This reflects the fact, established earlier in (5’), that as n grows very large 
the variance of the 7, around m goes to zero. 

In order to obtain a numerical example, we have taken the data from 
another subject in the experiment by Bruner and Zimmerman. Sixty-four 
monosyllabic English words were read aloud and the order of the words was 
scrambled before every presentation. A visual inspection of the data led us 
to choose an asymptote in the neighborhood of 0.7. This asymptote is drawn 
on the plot of the ¢, in Figure 7 and on the pilot of the r, in Figure 8. Then we 
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CUMULATIVE NUMBER OF RECALLS, k 
Figure 7 
Transitional Probability of Recall, 7, , as a Function of Number of Recalls in the Three- 
Parameter Case. Values of ¢, are indicated by open circles. The curve fitted to 
the tj is tr, = 0.7 — 0.57 (0.83). 

estimated py) = 0.13 by considering all the trials on which words were in 
state A, and calculating pp as the weighted average of the é,,4; for all those 
trials. Next we estimated the sampling parameter a = 0.83. This was 
done by obtaining the estimates, t, , for successive values of k; these estimates, 
together with (24), give us a set of equations estimating a. We used the 
weighted average of these estimates (ignoring negative values). Then we 
obtained a = 0.12 from the equation a = m(1 — a). We shall comment on 
the estimation problems later. 
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When these parameter values were substituted into (24) we obtained the 
function for 7, shown in Figure 7. When the values were substituted into (28) 
for k = 1, 2, 3, 4, we obtained the functions for p(A; , ) shown in Figure 9. 
When they were substituted into (31) we obtained the function, for p, shown 
in Figure 8. In Figure 8 the dotted lines are drawn + one standard deviation 
from p, , a8 computed from (33). 

A comparison of the values of p, computed from (31) and from (32) is 
given for the first eighteen trials in Table 1. With this choice of parameters 
the Bush-Mosteller approximation seems highly satisfactory. 

















TABLE 1 

Comparison of Exact and Approximate Values of p, for First 18 Trials 

Trial Exact Approximate Trial Exact | Approximate 
1 . 1300 . 1300 10 . 2663 . 2655 
2 . 1426 . 1426 11 . 2837 . 2827 
3 . 1559 . 1559 12 .3014 .3000 
4 .1700 .1700 13 .3191 .3174 
5 . 1847 . 1846 14 . 3369 .3347 
6 . 2000 . 1999 15 .3546 .3520 
7 .2159 .2157 16 3722 . 3692 
8 . 2323 . 2319 17 . 3896 . 3862 
9 .2491 . 2486 18 .4067 . 4030 

Discussion 


In the preceding pages we have made the explicit assumption that the 
several words being memorized simultaneously are independent, that memor- 
izing one word does not affect the probability of recalling another word on the 
list. The assumption can be justified only by its mathematical convenience, 
because the data uniformly contradict it. The learner’s introspective report 
is that groups of words go together to form associated clusters, and this 
impression is supported in the data by the fact that many pairs of words 
are recalled together or omitted together on successive trials. If the theory 
is used to describe the behavior of 50 rats, independence is a reasonable 
assumption. But when the theory describes the behavior of 50 words in a 
list that a single subject must learn, independence is not a reasonable as- 
sumption. It is important, therefore, to examine the consequences of intro- 
ducing covariance. 

The difference between the independent and the dependent versions of 
the theory can best be illustrated in terms of the set-theoretical interpretation 
of the two-parameter case. Imagine that we have a large ledger with 1000 
pages. The presentation of the list is equivalent to writing each of the words 
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at random on 100 pages. Thus p, = 100/1000 = 0.1. Now we select a page 
at random. On this page we find written the words A, B, and C. These 
are responses on the first trial. The rule is that each of these words must 
be written on 50 pages selected at random. Thus a = 50/1000 = 0.05. With 
the independent model we would first select 50 pages at random and make 
sure that word A was written on all of them, then select 50 more pages in- 
dependently for B, and 50 more for C. With a dependent model, however, 
we could simply make one selection of 50 pages at random and write all three 
words, A, B, and C, on the same sample of 50 pages. Then whenever A was 
recalled again it would be likely that B and C would also be recalled at the 
same time. 

The probability that a word will be recalled depends upon the measure 
of the elements conditioned to it (the number of pages in the ledger on which 
it is inscribed) and does not depend upon what other words are written on the 
same pages. Therefore, the introduction of covariance in this way does not 
change the theoretical recall, p,., . The only effect is to increase the variance 
of the estimates of p,., . In other words, it is not surprising that the equa- 
tions give a fair description of the recall scores even though no attention 
was paid to the probabilities of joint occurrences of pairs of words. Associa- 
tive clustering should affect the variability, not the rate, of memorization. 

The parameters a, p) , and @ obtained from the linear difference equa- 
tion (6), are assumed to describe each word in the list. Thus data from 
different words may be combined to estimate the various 7, . If the para- 
meters vary from word to word, p,,, is only an approximation of the mean 
probability of recall determined by averaging the recall probabilities of all 
the words. Similarly, the expressions given for p,., cannot be expected to 
describe the result of averaging several subjects’ data together unless all 
subjects are known to have the same values of the parameters. 

The general theory, of course, is not limited to linear restrictions of 
the form of (6). The data or the theory may force us to consider more com- 
plicated functions for 7, . For all such cases the general solution (2) is 
applicable, though tedious to use, and will enable us to compute the necessary 
values of p(A, , 7). 

Once a descriptive model of this sort has been used to tease out the 
necessary parameters, the next step is to vary the experimental conditions 
and to observe the effects upon these parameters. In order to take this next 
step, however, we need efficient methods of estimating the parameters from 
the data. As yet we have found no satisfactory answers to the estimation 


problem. 

There is a sizeable amount of computation involved in determining the 
functions p(A, , n) and p, . If a poor choice of the parameters a, po , and 
a is made at the outset, it takes several hours to discover the fact. In the 
example in the preceding section, we estimated the parameters successively 
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and used different parts of the data for the different estimates. After p, 
had been computed it seemed to us that our estimates of p, and m were both 
too low. Clearly, the method we have used to fit the theory to the data is 
not a particularly good one. We have considered least squares in order to 
use all of the data to estimate all parameters simultaneously. We convinced 
ourselves that the problem was beyond our abilities. Consequently, we must 
leave the estimation problem with the pious hope that it will appeal to some- 
one with the mathematical competence to solve it. 


Appendix A 


Solution for p(A, , n) in the General Case 


The solution of equation (1) with the boundary conditions we have 
enumerated has been obtained several times in the past (4, 5). We present 
below our own method of solution because the procedures involved may be 
of interest in other applications. 

Equation (1) may be written explicitly as follows: 


(1 — 7)p(Ao , n) = p(Ao, 2 + 1) 
top(Ao , m) + (1 — 7)p(Ai , 2) = p(Ai ,n + 1) 
7:p(A, ,n) + (1 — 72)p(Az,n) = p(A2,n + 1) 


eee eeree esr eer ee essere eseeeeeseeeeeeeeeseeeeeee 


This system of equations can be written in matrix notation as follows: 


1-7 0 0 0 +++] [p(do,n)) — [p(4o,n +1) 

To l1—7, 0 0 --+1 |p(A, , n) p(A, ,n + 1) 

0 7) 1-7 0 -++1 |p(A,, n) p(A,,n + 1) 
50 0 7? 1— 73 +: bd ote, Wye = (p(As,n+ Ir 




















. . . 
4 4 4 


This infinite matrix of transitional probabilities we shall call 7, and the 
infinite column vectors made up of the state probabilities on trial » and 
n + 1 we shall call d, and d,,, . So we can write 


Td,, = dass ° 


The initial distribution of state probabilities, d, , is the infinite column vector 
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{1, 0, 0,0, --- }. The state probabilities on trial one are then given by 


Td, = d, . 
The state probabilities on trial two are given by 
Td, = d, ’ 


so by substitution, 
Td, = T(Td,) = T’d. = da, . 
Continuing this procedure gives the general relation 
T"d, = d, . 
Therefore, the problem of determining d, can be equated to the problem of 


determining 7”. 
Since T is a semi-matrix, we know that it can be expressed as 


T = SDS", 


where D is an infinite diagonal matrix with the same elements on its diagonal 
as are on the main diagonal of 7 (e.g., 2). The diagonal elements of S are 
arbitrary, so we let S;; = 1. Now we can write 


TS = SD 
ae. hh @ 0 -|[1-7 0 0 
Soa 1 0 a Ss 1 0 4d | 0 1 easel | 0 * 
T'- p= 4 ‘ 
| Bas Sco 1 . Sai “Cie 1 “s 0 0 1 — Te . 














Now it is a simple matter to solve for S;; term by term. For example, to 
solve for S,, we construct (from row 2 and column 1) the equation 


T+ (1 aes 71) Sar — So(1 re. To), 


which gives 


Sa —_ to/(t1 a To). 


To solve for S;, , we use the equation 


71 So + (1 ia T2) Sai = S3:(1 ooze To) 
Sa = 71S2i/(T2 = To) 


= ToT /(11 Fi To)(Ta ta To) 
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(4 0 0 
To 
3S 1 
(71 — To) 4 
ToT1 v1 1 
J a To)(T2 = To) (72 im 71) 
aed ToT1T2 T1T2 T2 





(71 — To)(t2 — To)(ts — To) (T2 = 71)(T3 ~ ty (Ts — 9%) 











} ToTiT2 T1T2 T2 
(To ie T3)(71 = T3)(T2 ra 73) (1 = T3)(T2 4 T3) (T2 se 73) 











matrix T. Thus, 
T’ = (SDS™')(SDS"') = SD(S"'S)DS"* = SD’S", 
and in general, 


T" = SD"S". 


0 


0 


1 


1 
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Proceeding in this manner gives the necessary elements of S, and we have 





The elements of S~* can be obtained term by term from the equation 
SS’ = 1. For example, the element S3, of S~* is given by row two of S 


times column one of S™* : 7o/(7; — 70) + Si: = 0. Continuing in this way 
we have 
1 0 0 0 
To 
a. 1 0 
(to = 71) ' 
ToT1 T1 
es ———~ 1 0 
S ; |r a T2)(T1 as T2) (11 cr T2) 





4 


These matrices permit a simple representation of the powers of the 


Since D is a diagonal matrix, D" is obtained by taking the nth power of every 
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diagonal element. When this equation for 7” is multiplied through, we obtain 











A he = 
( (1 — 7)" 0 0 
Seen el 
” nf (1 a To)” 
is (7, ina To)(T2 ‘i To) 
(1 EB 7)" 
; + (To = 71)(T2 Pa 71) r 








POE Brock 8 | ab aay a] a — 7)" 


(To = T2)(71 ia T2) (72 Bi 71) (11 ca T2) 








4 


Since Td, involves only the first column of 7”, it is not actually necessary 
to obtain more than the first columns of S~* and of 7”. We have presented 
the complete solution here, however. It can be seen from inspection of the 
first column of 7” that (2) is the general solution: 


p(Ay ,n) = (1 — 7)", for k = 0, 
p(A, ’ n) = Tove °° * Teen ” . ho ; fork > 0. (2) 
TI = 10 


ixts 
This general method of solution can be used for the special cases con- 
sidered in this paper, with the substitution of the appropriate values for 7, . 


Appendix B 
Recursive Expression for p,, in Two-Parameter Case 
From (20) we obtain the recursive relation 
[1 — (1 — p)(1 — a) "Il — (1 — a)”**] 
A,,”. 
=<" os 
Rearranging and summing, we have 


. 1 oat (1 — a)**? | 
E —-(1— a)"*? P(Ags, ,n + 1) 





P(Ags: >” 4- 1) - 





k=0 


- > 1 — (1 — pd — "pA, , 2). 
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The right side of this equation is, from (5) and (18), pas. The left side can 
be rewritten 





E [Leta atint nto] ean 


which becomes on trial n (with n > 1), 


fl beets sai) mn. 


k=1 


We now have, by adding and subtracting p(A» , 7), 


| Sade. — Sd - o's] =o, 


1 — (1 aici a) k=0 
1— D) (1 — a)"p(A, ,n) = [1 — (1 — a)"Jp, . 
k=0 
Now we know that 
Pasi = 1 — (1 — po) > (1 — a)*p(A; , n), 
=0 

and so we obtain 

Par =1—(1—- Po) {1 -fi-(- a)"]pn}- 
Rearranging terms gives 


Pn+i1 = Po + (1 rola Do) (1 i (1 <a a)"|pn ’ (21) 


which is the desired result. 
From this result (15) is obtained directly by equating pp, and a. 


Appendix C 
List of Symbols and Their Meanings 


a parameter. 

A, state that a word is in after being recalled k times. 
parameter. 

d, infinite column vector, having p(A, , n) as its elements. 

D infinite diagonal matrix similar to 7’. 

k number of times a word has been recalled. 

m asymptotic value of 7, and p, . 

n number of trial. 

N total number of test words to be learned. 

Nin number of words in state A, on trial n. 


Do probability of recalling a word in state Ay . 
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P(A, ’ n) 
Tr 

Pn 

Si; 

Si; 

S 

ti 

ti.n 


Tk 


T 
Var (r,) 


t,k,n+1 
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probability that a word will be in state A, on trial n. 

observed recall score on trial n; estimate of p, . 

probability of recall on trial n. 

elements of S. 

elements of S~*. 

infinite matrix used to transform T' into a similar diagonal matrix. 
estimate of 7; . 

observed fraction of words in state A, that are recalled on trial n. 
probability of recalling a word in state A, . 

infinite matrix of transition probabilities 7, . 

variance of the estimate of p, . 

random variable equal to 1 or 0. 
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NOTE ON THE SCALING OF RATINGS OR RANKINGS 
WHEN THE NUMBERS PER SUBJECT ARE UNEQUAL 


Epwarp E. CurETtON 


UNIVERSITY OF TENNESSEE 


The average rating (or normalized rank) of a person rated by a larger 
number of judges will in general] be closer to the group average than will the 
average rating of a person rated by a smaller number of judges, as a result 
of rating unreliability and regression. This note presents a technique for 
correcting that bias. 


In criterion development and merit evaluation it is common practice 
to have each subject rated or ranked by as many judges as consider them- 
selves competent to do so. The investigator ordinarily starts by preparing 
a complete alphabetical list of the subjects. A copy of this list is given to 
each judge, who first crosses off the names of those he does not know well, 
and then rates or ranks the remainder. A comparable situation is encountered 
in selection when a rating blank is substituted for the usual letter of reference. 
In either case the numbers of ratings or rankings for different subjects are 
unequal, and any degree of overlap from zero to 100 per cent may be found 
among the subjects rated by any one judge and those rated by any other. In 
this situation, differences in rating standard from judge to judge must be 
considered a part of the error variance. 

A method for estimating the average reliability of the single rating has 
been given recently by Ebel.* His method may be outlined as follows: 


Let X_ be a rating (or normalized rank), 
N_ the number of subjects in the total group, 
mn, the number of ratings of the 7th subject, 
M the mean of all ratings, 
M, the mean of the ratings of the 7th subject, 
>> a summation from 1 to N, 
S asummation from 1 ton, , 
s, the within-persons variance of X (the error variance), 
s; the between-persons variance of X, and 
k aspecial form of average of the N values of n, . 


*Ebel, Robert L. Estimation of the reliability of ratings. Psychometrika, 1951, 
16, 407-424. 
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Then 
> 8x 
a I 
M > n; ’ ( ) 
»_ 2, SX’-—M > SX (2) 
me ” > 2; — N ; 
vy, = SX: 
M, —_ Nn; ’ (3) 
2» 2 (M,SX,)) — M Do 8X 
s = Se ’ (4) 
AS ? 
k = (do 2) : i Ni , (5) 
(N — 1) dyn 
and 
2 2 
oid & — & (6) 





+ Dee 


The values of >> SX, Bs SX’, and }> n, can be obtained directly from the 
entire set of ratings or normalized rankings. The ratings of each subject are 
then counted and summed, yielding n; , SX, , and A/; , and from these values 
we can compute >. n;, >, n?, and >, (M,SX,). The value of >. n, obtained 
at this step serves as a partial check. In computing these values we cannot 
use any subject who has been rated by only one judge. 7, is the average 
reliability of the single rating. 

If we wish to compare the average ratings of different subjects, the 
situation is complicated by the fact that the error variance is greater for a 
subject rated by a smaller number of judges than for a subject rated by a 
larger number. The extreme average ratings at both ends of the scale will 
tend to go in part to subjects rated by small numbers of judges rather than 
simply to those rated high and low. This bias is systematic, and may be 
removed by computing the estimated “true’’ rating for each subject, X,.. , 
and using it instead of the average raw rating, M,;. To do this we compute 
the value of r,; for each subject by the Spearman-Brown formula, 


n,7 ~ 
a (7) 


Then 
X;,0 = M(1 —1r,;) + 7,M,; . (8) 


This last equation gives an unbiased estimate of the “‘true”’ rating. 
If the original group contains persons rated by only one rater, formula 
(8) may be applied to them also. For such persons, r,; is r, and MM, is X; . 
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The value of 7; is still estimated, of course, from those cases for which two or 
more ratings are available. If persons having only one rating are included in 
the final group, a new value of M, including the scores of these persons, should 
be computed by (1) for substitution in (8). 


Manuscript received 4/30/52 
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MULTIDIMENSIONAL SCALING: I. THEORY AND METHOD* 


WaRREN S. ToRGERSON 


SOCIAL SCIENCE RESEARCH COUNCIL 


Multidimensional scaling can be considered as involving three basic 
steps. In the first step, a scale of comparative distances between all pairs 
of stimuli is obtained. This scale is analogous to the scale of stimuli obtained 
in the traditional paired comparisons methods. In this scale, however, 
instead of locating each stimulus-object on a given continuum, the distances 
between each pair of stimuli are located on a distance continuum. As in 
paired comparisons, the procedures for obtaining a scale of comparative 
distances leave the true zero point undetermined. Hence, a comparative 
distance is not a distance in the usual sense of the term, but is a distance 
minus an unknown constant. The second step involves estimating this 
unknown constant. When the unknown constant is obtained, the com- 
parative distances can be converted into absolute distances. In the third 
step, the dimensionality of the psychological space necessary to account for 
these absolute distances is determined, and the projections of stimuli on 
axes of this space are obtained. A set of analytical procedures was developed 
for each of the three steps given above, including a least-squares solution 
for obtaining comparative distances by the complete method of triads, two 
practical methods for estimating the additive constant, and an extension of 
Young and Householder’s Euclidean model to include procedures for obtain- 
ing the projections of stimuli on axes from fallible absolute distances, 


Introduction 


Te traditional methods of psychophysical scaling presuppose knowledge 
of the dimensions of the area being investigated. The methods require 
judgments along a particular defined dimension, i.e., A is brighter, twice as 
loud, more conservative,.or heavier than B. The observer, of course, must 
know what the experimenter means by brightness, loudness, etc. In many 
stimulus domains, however, the dimensions themselves, or even the number 
of relevant dimensions, are not known. What might appear intuitively to 
be a single dimension may in fact be a complex of several. Some of the 
intuitively given dimensions may not be necessary—it may be that they 
can be accounted for by linear combinations of others. Other dimensions of 
importance may be completely overlooked. In such areas the traditional 
approach is inadequate. 

Richardson, in 1938 (3; see also Gulliksen, 1) proposed a model for 
multidimensional scaling that would appear to be applicable to a number of 

*This study was carried out while the author was an Educational Testing Service 
Psychometric Fellow at Princeton University. The author expresses his appreciation to 


his thesis adviser, Dr. H. Gulliksen, for his guidance throughout the study and to Dr. 
B. F. Green, Jr., for valuable assistance on several of the derivations. 
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these more complex areas. This model differs from the traditional scaling 
methods in two important respects. First, it does not require judgments 
along a given dimension, but utilizes, instead, judgments of similarity between 
the stimuli. Second, the dimensionality, as well as the scale values, of the 
stimuli is determined from the data themselves. 

Multidimensional scaling may perhaps best be considered as involving 
three basic steps. In the first step, a scale of comparative distances between 


' , \alb pairs’ of stimuli is obtained. The second step involves estimating an 


additive constant and using this estimate to convert the comparative dis- 
tances into absolute distances. In the third step, the dimensionality of the 
psychological space necessary to account for these absolute distances is 
determined, and the projections of the stimuli on axes of this space are 
obtained. 


The Scale of Comparative Distances 


The scale of comparative distances obtained in the multidimensional 
methods is analogous to the one-dimensional scale of stimulus-objects obtained 
in the traditional paired comparison type methods. 

In the one-dimensional methods, the obtained scale locates the stimulus- 
objects with respect to one another on the given continuum. For example, 
given four stimulus-objects designated S, , S. , S; , and S, , the one-dimen- 
sional procedure might yield the following scale: 





S Se S3 S4 


In this scale,’the locations of the stimuli relative to one another only are 
determined from the data. The zero point of the scale is arbitrary. While 
the usual procedure is to locate the zero point so as to coincide with the 
stimulus having the lowest scale value, any other finite location on the con- 
tinuum would serve equally well. 

In the analogous scale of comparative distances obtained in the multi- 
dimensional procedures, the element, instead of being a stimulus-object, is a 
distance between two stimuli. Thus, given the same four stimulus-objects, 
the scale of comparative distances locates, with respect to one another on a 
distance continuum, the six inter-stimulus distances, dj» , dis , di4 , doz , des , 
and ds, : 





45, 4 4,5 404 415 4, 


The locations of the inter-stimulus distances relative to one another only are 
determined from the data. The zero point is again arbitrarily selected. It 
is important to note, however, that a comparative distance is not a “distance” 
in the usual sense of the term, but is a distance’ minus an unknown constant. 
In order to obtain absolute distances between stimuli, it is necessary to 
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estimate this constant. This is equivalent to estimating the true zero point 
of the scale of comparative distances. Thus, a comparative distance h;, 
plus an unknown additive constant C gives the corresponding absolute 


distance d;, : 
hix + C= dix . 


The Additive Constant for Converting Comparative Distances into Absolute 
Distances 

In estimating the additive constant, it is assumed that that value which 
will allow the stimuli to be fitted by a real, Euclidean space of the smallest 
possible dimensionality is the value wanted. Consider, for example, five 
points having the following comparative interpoint distances h;, (j, k = 1, 


Biss SO Oe: 
his = :, his = E, hes = 1, hes Pr 0, has is —I, 


hs =2, Iks=—1, hu =4, hu =1, %Ins= 0. 
With these comparative distances the value of the additive constant 
which will allow the stimuli to be fitted by a real, Euclidean space of the 
smallest possible dimensionality is 4. If we add 4 to each of the comparative 
distances to convert them into absolute distances we obtain 
diz = 5, dy, = 5, do, = 4, dis = 3, ‘ ds; = 3, 
= 2 


dis = 6, ass = 8, dk = 5, las = 5, 
The five stimuli can be plotted in a two-dimensional space: 
Ss . 


. ‘ 








3 

Ss 4 ” a Se 
3 
~*~] 
s 


3 

Note that for any smaller value of the additive constant the points do~. 

not exist in a real Euclidean space. For example, if 1, 2, or 3 is added, then 

dis + dos < do, , an impossible relationship in real Euclidean spage. Also, 

for any larger value of the additive constant, the points lie in a real space of 
dimensionality greater than two. 
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Determination of the Dimensionality of the Psychological Space and the Projec- 
tions of the Stimuli on Axes of the Space from the Absolute Distances Between the 
Stimuli 


Young and Householder (5) have given a method for determining whether 
a set of absolute interpoint distances can be considered to be the distances 
between points lying in a real Euclidean space. They also have given, pro- 
vided that the distances can so be considered, methods for determining the 
dimensionality of the space, and the projections of the points on a set of 
orthogonal axes of the space. Their theorems involve two basic matrices, 
B; and F. 


If we let 


i,j, and k be alternate subscripts for n points (7, 7, k = 1, 2,...,) and 
d;; , dy, , and d;, be the distances between the points, then B, is an 
(n — 1) X (n — 1) symmetric matrix with elements 


Dix ua 4d; + ts — di). (1) 


The element 6;, may be considered to be the scalar product of vectors 
from point 7 to points j and k. This follows directly from the cosine law. 
That is, given the three points 7, j, and k, 


ik = di; + dix — 2d, :di, COS 0; ix ’ 
which rearranged becomes 
d,;;d;, cos 0% = 3(di; + dix = diy). (3) 


From Equations 1 and 3, it is seen that b;, = d;;d,, cos 0;., , the scalar product 
of vectors from point 7 to points j and k. Matrix B, is thus a matrix of scalar 
products of vectors with origin at point 7. There are, of course, n possible 
B, matrices, since t may assume any value from 1 to n. 

Matrix F is an (n + 1) X (n + 1) symmetric matrix of squares of inter- 
point distances bordered by a row and column of ones as follows: 


0 a ick é. aes" a 1 
ds, 0 «i 
d., a pYene a ee 0 1 








— — = = 
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Young and Householder have shown that: 


1. If any matrix B; is positive semidefinite, the distances may be con- 
sidered to be the distances between points lying in a real Euclidean 
space. 

2. The rank of any positive semidefinite matrix B; is equal to the di- 
mensionality of the set of points. 

3. The rank of matrix F is two greater than the dimensionality of the 
set of points. 

4. Any positive semidefinite matrix B; may be factored to obtain a 
matrix A; such that 


B,; = A,A/. (5) 


If the rank of B; isr, where r < (n — 1), then matrix A; is an (n — 1) 
X r matrix of projections of points on r orthogonal axes with origin 
at the 7th point of the r-dimensional, real Euclidean space. 


It is interesting to note that except for Richardson’s original experiment 
(only an abstract of which has been published) only one person, Klingberg, 
(2) has used the model. It may well be that one of the reasons for the lack of 
experimental investigation in this area is that no clear statement of analytical 
procedure has been published. The problem of precisely how to proceed in 
obtaining comparative distances from proportions of judgments has not been 
adequately answered for either Richardson’s method of triadic combinations or 
Klingberg’s method of multidimensional rank order. While the analogy 
between the logic of paired comparisons and both of these methods is clear, 
the procedures cannot be directly applied in obtaining an efficient estimate. 
The least-squares solution for paired comparisons scales cannot be used 
because the analogous proportion matrix contains a rather large number of 
vacant cells—neither multidimensional method obtains judgments of the 
differences in distance between all possible pairs of distances, but only between 
pairs having one stimulus in common. Furthermore, in reducing the matrix 
of distance-differences between pairs to a scale of comparative distances, 
one is almost overwhelmed by the great number of possible modes of attack— 
each likely to give a somewhat different answer due to error in the observed 
data. 

The problem of how to obtain a best estimate of the unknown additive 
constant has not been answered. The method used by Klingberg is quite 
tedious (it involved obtaining two tenth-order polynomials from the fifth- 
order minors and then solving for the unknowns) and does not insure that 
the answer obtained is a best estimate, or that it even approximates the value 
desired. 

Similarly, while Young and Householder give adequate procedures for 
obtaining projections of points on axes from distances when the data are 
infallible, a number of difficulties arise when fallible data are employed. 








406 PSYCHOMETRIKA 


The purpose of the present paper is to present a set of analytical pro- 
cedures for multidimensional scaling, including, as far as possible, routine 
procedures for obtaining comparative distances, for estimating the additive 
constant, and for obtaining projections of stimuli on axes when fallible 
absolute distances are given. We shall first consider the complete method 
of triads for obtaining comparative distances between the stimuli. Follow- 
ing this, the problem of obtaining projections of stimuli on axes from fallible 
absolute distances will be discussed. Finally, we shall consider various 
methods for estimating the unknown additive constant. 


The Complete Method of Triads for 
Obtaining Comparative Distances Between Stimuli 


The stimuli are presented to the subject in triads. The judgment re- 
quired of the subject is of the form: ‘Stimulus k is more similar to stimulus 
j than to stimulus 7.”” With n stimuli, there are n(n — 1)(n — 2)/6 triads. 
In each triad, each stimulus is compared with each other pair, making a 
total of n(n — 1)(n — 2)/2 judgments for each subject. From these judg- 
ments we obtain the proportion of times any stimulus k is judged more 
similar to stimulus 7 than to 7. 

These proportions can be arranged in the n matrices ,P;; where k, 2, 
and j are alternate subscripts for the stimuli. k gives the number of the 
matrix, 2 is a row index, and 7 is a column index. The element ,p,; is the 
proportion of times stimulus k is judged closer to stimulus j than to7. The 
matrices ,P;; have vacant cells in the principal diagonal, and in the kth row 
and column.* The matrices are such that the sum of symmetric elements 
is unity—e.g., .Po, + Pano = 1. For example, given four stimuli, 1, 2, 3, and 
4, there are four ,P;; matrices. The second matrix (k = 2) is illustrated below: 





oP; 
1 2 3 4 
] 4 : 2Pi3 2Pi4 
4 % . . . 
3 2Ps1 * Z 2Ps4 


+ 2D 4 2P3 


The first problem is to transform the proportions ,p;; into differences in 
distances ,2;;. We shall assume that the proportion of times stimulus k is 
judged closer to stimulus j than to 7 is a function of the difference in the 


*It might be noted that the elements in the kth row and column could be obtained 
experimentally. However, since the method would ordinarily be used in connection with 
supraliminal distances, the experimentally determined proportions would be either .00 
or 1.00. As in paired comparisons, proportions of .00 and 1.00 cannot be utilized. 
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distances, d,, — di; = .%.; , the function being 


= krii » te : 
kDii ‘ aie € dx. (6) 

.t;; is thus 4/2 times the deviate of the unit normal curve measured 
in o units from the mean. This is analogous to Thurstone’s Case V in paired 
comparisons (4) and is the same assumption used previously by Richardson 
(3). 

Making this transformation we obtain the » matrices ,X;; . These 
matrices are skew symmetric (7, + 42%, = 0), have zero diagonal elements, 
and have vacant cells in the kth row and column. 

We have n(n — 1)(n — 2)/2 independent observations of differences in 
distances ,7;; from which we wish to detéfmine n(n — 1)/2 comparative 
distances h;,. Since n(n — 1) — 1 differences in distances are sufficient to 
determine a matrix of comparative distances, it is apparent that the data 
are considerably overdetermined. There are, of course, a large number of 
sets of 4n(n — 1) — 1 differences in distances which could be used. Also, 
there are many different ways of obtaining the comparative distances from 
each set. With fallible data, the matrices could be expected to differ some- 
what from each other. 

The first problem, then, is to find a best estimate, in a least-squares 
sense, of the matrix of comparative distances h;, in terms of the available 
data. 

The element ,2;; = die — dk; + .e:;; , where d;, and d,; are absolute 
distances between stimuli & and 7, and k and J, respectively, and ,e;; is an 
error.* It would seem that we want that set of interpoint distances which 
minimizes the sum of squares of the errors ,e;; . For a least-squares solution, 
then, we wish to select the distances to minimize the following function: 


QF = x a : [tis — ie — dP. (7) 


74k ii 
ixk 


If we define a set of matrices ,£;; with elements (,2;; — d;, + d;,;), it is 
seen that 2¥ is equal to the sum of squares of elements of the matrices ,£;;°. 
Let g and A correspond to two particular stimuli with d,, = d,, , the 
distance between them. The term d,, (or d,,) occurs only in the error matrices 
of ;; and ,£,; as follows: 
in ,£;; the hth column contains the elements ,v7;, — d;, + d,» , 
the hth row contains the elements ,2,; — d,, + d,; ; 
in ,E,; the gth column contains the elements ,2%;, — di, + dh, , 
the gth row contains the elements ,2,; — d,, + d); . 


__ «xij is also equal to the difference between the comparative distances, since the 
difference in absolute distances dj, — dy; is identical with the difference in comparative 
distances (dix — C) — (dk; — C). 
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To minimize 2F, we first take the derivative of F with respect to d,,. We 
shall designate this derivative as F’. It is apparent that the derivatives of 
all terms of F except those containing the element d,, vanish. Therefore, 


f= > (ti, — dig + dys) — > (ota; — Ino + a,;) 


ime. sia 
+ : (sXig — Ain + dng) — z (429; — Agn + dy;). (8) 
to,h ixa,h 


But, since matrices ,/;; and ,E;; are skew symmetric, 


™ (,%sn aa di, + dyn) = + (oXn3 as dno + d,i), (9) 
txg,h ixog,h 

and 
ow (stig —dyt Ano) = z (429; — da, + d,;). (10) 
ixg,h ivayh 


Therefore, we may write 
F’ = 2 me (Xia ai di. + dyn) + 2 >» (n0 ig = din + dy). (11) 
ixtg,h txo,h 


Setting F’ equal to zero, and summing over each term, we find 


2 otih — a dz, + (n — 2)dy, + e hLig — 7 din 


‘ s 
ixto,h txo,h ixg,h txo,h 


+ (n — 2)d,, = 0. (12) 
Remembering that d,, = d,, = 0, and that the diagonals and kth row 
and column of all ,X;; matrices are vacant, we can write: 
pe otih ai dig + (n — 2)d,, + Ze Avig — es din 
‘ inh ino 
+ (n — 2)d,,=0. (13) 
Subtracting d,, from — » d;, and from — > d;, , adding d,, to (n — 2)d,, 


‘ 
ixth ing 


and (n — 2)d,, , and remembering that d,, = d,, , we have 


>> olsh — » di, + (n = Ida + > Avig he dj, 
+ (n — 1)d,, = 0. (14) 


Summing over g, g ¥ h, we have 


DL tin — dL Lido tm — VY Lida t DY ates 


oxh ornmh orth omh 


-M@-N) Vdat+@-1) Ld,=0. (15) 


onh 




















WARREN S. TORGERSON 409 


But 
2s dy Ate = 0, (16) 


onh 


and, since d,, = 0, 


din = DO dy ; (17) 


anh 


therefore, 


Lata - LV Vda tal Vda = 0. (18) 


onh oxh 


Subtracting > d;, from > > d;, and adding > d,, to (n — 1) > d,,, we 


onh 


see, from Equation (17), that 


XY Lita Le Lidu tn Dida = 0. (19) 


Rearranging, dividing by n(n — 1), and remembering that cells ,7;, are vacant, 
we find 


1 
TTR GAG EE weHEe: | 
Also, if we divide Equation (14) by 2(n — 1) and rearrange, we obtain 
1 1 
an — 1) [= tet 2 an | a a 5 be tant ta @l) 


It will be convenient to define the averages in Equations (20) and (21) 
as follows: 


dy = oT D da = - Lda, (22) 
‘+e = did, (23) 
d..= nn > de dis, (24) 
tae eae Di stan y (25) 
ig = 7 D ste, (26) 
Tn = — mw a) Se De ein - (27) 
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After substitutions have been made for the appropriate terms, Equation (21) 
becomes, 


(4.4 +n) — don = Bot.n + no); (28) 
and Equation (20) becomes 
t,r=d,—d,, (29) 
and, when h = g, 
t..=d,,—d,,. 
Substituting for d,, and d,, in Equation (28), we have 
4@.. — 244... — 21) — On = Hots + ot..), (30) 
which rearranged becomes 
d., — dy = Hotn t 2g $+ atig + 2): (31) 


When g = j, h = k, the comparative distance h;, = d,, — d,,. Since 
the z-values are functions of the observed proportions (Equation (6)), Equa- 
tion (31) gives the comparative distances as functions of the observed data. 
Equation (31), then, gives a rather straightforward method for obtaining 
the best estimate, in a least-squares sense, of the matrix of comparative 
distances. 


Obtaining Projections of Stimuli on Axes from Fallible Absolute Distances 


For a situation in which the data are not fallible and in which absolute 
distances are given, Young and Householder have shown (a) how to deter- 
mine if the stimuli lie in a real Euclidean space, (b) if they do, how to deter- 
mine the dimensionality of the set of points, and (c) how to obtain the 
projections of the points on an arbitrary orthogonal reference system. This 
reference system may then be rotated to the “most meaningful” dimensions, 
if criteria for such are available. 

We saw that if matrix B,; (Equation 5) is positive semidefinite, the 
stimuli lie in real Euclidean space. The rank of B, (or two less than the rank 
of matrix F) is then equal to the dimensionality of the set of stimuli. Matrix 
B; can be factored to obtain projections of the stimuli on an arbitrary set of 
orthogonal axes. 

Matrix B; , however, is constructed by placing the origin arbitrarily at 
one of the stimuli. With errorless data, the results will be identical (except 
for the orientation of axes and location of the origin) for each of the 7 possible 
matrices B;(i = 1, 2,...,). With fallible data, however, each point is 
somewhat in error. Assuming a true rank considerably less than the number 
of points, each matrix B; will yield different results. We would then have the 
problem of deciding which B; matrix gives the best solution. 
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One solution to this problem would be to place the origin at the centroid 
of the stimuli. This procedure would give a unique solution and would tend 
to allow the errors in the individual points to cancel each other. An origin at 
the centroid would, on the average, be less in error than an origin at any 
arbitrary stimulus. The problem would seem to be to find a convenient 
method of obtaining a matrix B* with origin at the centroid of the stimuli 
instead of at one of the stimulus points. 


We shall use the following notation: 


m = axes (oem 1,2,...,9), 
j, k = points 3) 2 > See} Fe 
7 = point taken as the origin, 
djm = projection of point 7 on axis m, and 
distance between points j and k; 


dit 
and take as given Equations (1) and (5): 


B= AA 
b;, = 4 (d;; + di, — d;,), where point 7 is taken as the origin. From 
Equation (5) it is seen that 


bi. = ps AjmAkm + (32) 


We shall, however, consider B; to be an n X n matrix with the 7th row 
and column composed of zero elements. In like manner A is n X r with 
the 7th row composed of zero elements. 

We wish to translate the axes from an origin at point 7 to an origin at 
the centroid of all points. 


Let A* = || ajn+ || be the desired matrix of projections of points j on 
axis m* of the new coordinate system with origin at the centroid of the n 
points. 

Then 
Ajme = Ajm — Cm (33) 
where 
Cn = * >> a;n = the average projection of points on (34) 
i=1 


axis m = projection of centroid on axis m. 


B* = A*A*’ = || bf, ||, (35) 
and 


bt, = > AinsGner « (36) 
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Substituting, we have 


bg = Sie — Coen — te) 


Dd Gimlim — Dy GinCm — Dy AemCm + Dy CmCm « (37) 

From Equation (34) it is seen that 

b}, = 2 Qim2km — t >. Aim ss Aim — > Akm 2 km 
m m 7 ™m k 


E[ Lo] Fa} os 


2 


vel 


tt. 


n 
But 


e bn = . im . Aim (39) 
and | ° 
$d.-¢[4-} a 
Substituting, we have 


Lh t ed Ube. (41) 


Equation (41) gives a routine method of translating a matrix B; with origin 
at point 7 to an equivalent matrix B* with an origin at the centroid of the 
points. It makes no difference, of course, which of the n matrices B; is used 
in obtaining matrix B*. 

Matrix B*, then, is the B-matrix we wish to factor to obtain projections 
of stimuli on axes. 


lc 1 
tn > Zt | 


Estimating d,. , the Unknown Additive Constant 


The procedures of obtaining dimensionality and projections on axes 
discussed in the preceding section require absolute distances as given data. 
When the given data are comparative distances (hj, = d.. — d;,)t, the un- 
known constant must be estimated to convert the comparative distances 
into absolute distances. We shall first consider the case where the data are 
not fallible, after which we shall discuss procedures for fallible data. 


1. Estimating d,, from errorless comparative distances 


With errorless data, in order that the stimuli be considered as lying in a 
real Euclidean space of r dimensions, the B; matrix must be positive semi- 


¢Comparative distances with signs reversed, actually (his = — hia). 
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definite and have a rank equal tor. This is equivalent to the statement that 
r latent roots of B; must be positive and the remaining (n — r) equal to zero. | 

The value of d., desired is the value which will permit the location of the 
stimuli in a real, Euclidean space of the smallest possible dimensionality. In 
terms of the matrix B*, the value of d,, desired is that value which results in 
the positive semidefinite B* with the lowest rank. In terms of the latent 
roots, this becomes that value which results in a matrix B* with the largest 
number of zero roots under the condition that the remaining non-zero roots 
are all positive. 

This value can be determined, although it involves a tremendous amount 
of labor. The straightforward solution would be as follows: 


1. Construct matrix B* from the given data (d;, = d,. — Bad. 
2. Obtain the characteristic equation: 
| B* — r»7 | = 0. 


3. Set the last term equal to zero and solve for the real, positive values 
of d.,. This term will be an (n — 1)th degree polynomial ind,,. One 
of these values is the value desired. 

4. Substitute each of the values for d,, in the complete characteristic 
equation. Inspection of these equations shows which value of d., 
yields the largest number of zero roots. 

5. The value which yields the largest number of zero roots with the 
remaining roots all positive is the value desired. 


A “short-cut” procedure would be to evaluate the determinant of B* 
directly. This determinant is the last term of the characteristic equation. 
One could then obtain the real, positive values of d,, as in (3) above. Each 
value could then be substituted for d,, in B*. The latent roots of B* could 
then be computed for each real positive value of d., . One would be the 
desired value. This method would also involve a prohibitive amount of labor. 

A third method would be to first estimate the dimensionality of the set 
of stimuli. To check the estimate, one could obtain an estimate of d., by 
evaluating one (or more) of the principal minors of B* having an order equal 
to one greater than the estimated dimensionality. This estimate could then 
be substituted into B* and the latent roots calculated. 

There are a number of other methods possible involving the principal 
minors of the B* matrix.t In general, they would all hinge on the fact that 
the correct estimate of d.. , if used in B* results in (a) all principal minors of 


tOne could also use the matrix F (Equation 4) to obtain the value of d,, which 
would minimize the dimensionality of the set of stimuli since it is known that the rank of 
F is two greater than the rank of B;. There would seem to be little point in this, how- 
ever, since F is a larger matrix, therefore involving more tedious procedures in evaluating 
the determinants, and since no properties of F have been given from which to determine 
whether the value of d,. obtained will allow the stimuli to be placed in a real space. 
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order greater than the dimensionality of the stimuli being null, and (b) all 
principal minors of order equal to or less than the dimensionality being non- 
negative. 

It is interesting to note that unless all points are equidistant from each 
other (in which case things collapse down to zero dimensions) it is always 
possible to obtain an estimate of d., in which the rank of B* is at least two 
less than the number of stimuli. Thus before one could place any confidence 
in the obtained dimensionality of the stimuli, the rank of B* would have 
to be smaller by three than the number of stimuli and preferably considerably 
smaller. 

All of these methods require a great deal of labor. In addition, when 
fallible data are used, it is doubted whether the methods would give the 
solution we wish. We would probably never obtain a positive semidefinite B* 
matrix with a rank less than the number of stimuli minus two from fallible data. 


2. Estimating d,. from fallible comparative distances 

If we could obtain a positive semidefinite B* whose non-zero roots con- 
sisted of a few large positive roots and a number of small positive roots by 
the methods outlined in the previous section, we could probably discard the 
small roots and conclude that the true dimensionality is equal to the number 
of large positive roots. Even in this case, however, there would probably 
be a better estimate of d,.. In the above example we have essentially as- 
sumed that any error must be such as to change the zero roots to positive 
values. It would be more reasonable to assume that errors would tend to 
change some zero roots in the positive direction and some in the negative 
direction. If we think of 3 points lying in a line so that d,. + d.3 = d,;, the 
former would hold that any error would tend to make (dy. + €,2) + (de3 + 
€23) > (diz + €:3), Whereas the latter would hold that (d,. + e:2) + (dos: + 
€23) < (diz + @13) is equally likely. 

This means that with fallible data the condition that B* be positive 
semidefinite as a criterion for the points’ existence in real space is not to be 
taken too seriously. What we would like to obtain is a B*-matrix whose 
latent roots consist of 


1. A few large positive values (the ‘‘true’’ dimensions of the system), 


and 
2. The remaining values small and distributed about zero (the ‘“error’’ 


dimensions). 


It may be that for fallible data we are asking the wrong question. Con- 
sider the question, “For what value of d., will the points be most nearly (in 
a least-squares sense) in a space of a given dimensionality?’ When one is 
interested in, or has reason to suspect, a one-dimensional case, the best d., 
in a least-squares sense is rather easy to obtain. In a one-dimensional set 
of points, d;; + d;, = d;, + e where 7 is between k and 7, and e¢ is an error. 

















WARREN S. TORGERSON 415 


In terms of available data (d., — dj, = hin), this becomes 


é..- hij +d..- ir =d..— Riss +e 
or , “ ' 
d.. thu — his — hin =e. 
The d,, which will minimize the sums of squares of all of the e’s would 
seem to be the d.,. desired. If we let 
2F = b (d.. + hes a Rss a ix, (42) 
k>i>i 
then, to minimize 2F, we take the derivative of F with respect to d,, and set 
it equal to zero. Designating this derivative as F’, we have 
Fo= DO (ha -—hp —hid + D a. =0, (43) 
k>i>i k>i>i 
which rearranged becomes 
Z. d.. = bm (his + hive ae his). (44) 
k>i>i k>i>i 


Dividing by a= ae =) we find 








6 ™ “ ss 
d.. wor n(n a 1)(n a 2) 4, (hi; + hit es his). (45) 
In the one-dimensional case, it will ordinarily be possible to obtain the 
order of the n stimuli. If we define a symmetric matrix H;, (j designates 
rows, k designates columns, 7, k = 1, 2, --- , n) composed of elements h,, , 
the sum of the columns of H;, divided by (n — 1) gives the average distance 
of all points from each other minus the average distance from point & to all 
other points. Small values of h., indicate that k is near one end of the con- 
tinuum, and large values indicate that k is near the center. Inspection of 
H,, will ordinarily suffice to determine on which side of the continuum the 
particular stimulus is located. Given matrix H;, with rows and columns in 
correct order, a shortcut method of obtaining 
p> (his + Rie = his) =L 
>i>i 
is to 
1. Obtain the diagonal sums S, of elements above the principal diagonal: 
S. = p — (c = 1, 2, ae oe 1). (46) 
i=1 
2. Multiply S, by (n — 2c). The sum is equal to L—.e., if we let ¢, = 
(n ad 2c), 


L= > Sut. . (47) 
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For the case where the dimensionality is expected to be greater than one, 
this general approach does not seem to be very practical. While one could 
think of finding that d.. which will minimize the sums of squares of volumes 
of all possible tetrahedrons for the two-dimensional case, and the correspond- 
ing hyper-volumes for the higher-dimensional cases, it would seem that the 
labor involved would again be prohibitive. 

There is another procedure which might serve to give a fair estimate 
of d., for cases where the expected dimensionality is greater than one. Ifa 
one-dimensional subspace of four or more points exists in the data, that 
subspace could be used to estimate d,,. While this procedure does not give 
a “best fit” in the least-squares sense, it does appear to be the most practical 
method suggested thus far. The method has been applied to actual data and 
was found to work quite well. The existence of such a subspace is relatively 
easy to determine. One can compute, for each set of three stimuli, the 
value of d.. which would be required to locate the set of three along a straight 
line. There are n(n — 1)(n — 2)/6 of these “estimates,” one for each set of 
three different stimuli; and they will be designated as d... The values of 


d.. may be obtained from the following equation: 


d= hi; + Nir = : where hea hi; ; his ‘ (48) 
Given the n(n .— 1) (n — 2)/6 values of d.. , the following points can be 
noted: 

a. Except for error, points most nearly in a straight line will give the 
largest value of d.. . 

b. If the four sets of three of any four points give about the same 
“highest” value of d.__ in a consistent manner, we can conclude that the four 
points are in a one-dimensional subspace. This value of d.. would then be 
the estimate of d.. wanted. 

c. If sucha set is not found, the largest value of d., might still be worth 
trying as an estimate of d... Using this value is equivalent to assuming that 
of the set of points at least one group of three is approximately linear. If 
one constructs a B-matrix with one of the points at the origin using this 
estimate and then finds that the third-order principal minors involving these 
three points vanish (approximately) this value of d., is probably a good 
estimate. The entire B-matrix need not, of course, be constructed. One 
would need to evaluate only (n — 3) third-order principal minors involving 
only 3 (n — 2) distinct elements instead of the (n — 1)(n — 2)/2 elements 
in the complete matrix. 


Summary 


A set of methods for multidimensional scaling based on Richardson’s 
original model (3) have been developed, including a least-squares solution for 
obtaining comparative distances, and routine procedures for estimating the 
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additive constant necessary to convert comparative distances to absolute 
distances and for obtaining projections of stimuli on axes when fallible 
absolute distances are given. An outline of the procedures developed is 
given below. 


A Routine Procedure for Multidimensional Scaling 
A. To obtain comparative distances by the complete method of triads. 


1. Construct the n matrices ,P,; from the raw data. 
2. Construct the corresponding matrices ,X ;;. 
3. Obtain a row vector of averages of columns for each of the n matrices 
aX; 
er - s., 
ee en Pee 
4. Construct matrix ,X.; composed of these row vectors (k designates 
row, j designates columns). 
5. Obtain a row vector of averages of columns of ,X.; . 


1 
Se p» KU 5 
nN “& 


6, Add the gth element of .X,; to each element in the gth row of ,X., . 
Call this new Matrix G,; . Matrix G,; thus contains the elements 
(,2.; + Bq). 

7. Average the symmetric elements of G,; to obtain the symmetric 
matrix H;, . Matrix H;, is composed of the elements Rin =d.- 
d;, , the comparative distances (with a negative sign) between stimuli. 


Rion = ing = 2(Gon + Jno) 


B. To obtain an estimate of d,. . 
1. If the hypothesis of unidimensionality of stimuli seems reasonable: 
a. Arrange rows and columns of H,, in order of magnitude of the 
stimuli by 
(1) Noting magnitudes of sums of columns of H;, , and 
(2) Examining elements of H;, . 
b. Obtain diagonal sums of H;, above principal diagonal. 


n-c om) 
S. = 2, icisey . 
i 





c. Multiply each S, by (n — 2.) and sum the products to obtain L. 


\ n-1 
| = > S. (n — 2c). 
c=l 
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d. Divide L by n (n — 1) (n — 2)/6 to obtain d... 


— L 
"n(n — 1)(n — 2)/6" 


2. If it is reasonable to assume dimensionality greater than one with at 

least one set of four stimuli in a one-dimensional subspace: 

a. Obtain the n(n — 1) (n — 2)/6 values of d.. assuming in turn that 
each set of three stimuli lie in a line. 

d.. = hij f- hin — Ris ’ 
hie -< Rss ; is : 

b. The four sets of three of any four points lying in a line will give the 
same “highest” value of d.. (except for error) in a consistent manner. 
If such a set is found, this value of d,, will be a good estimate of d.. . 

c. If no such set is found, use the highest value of d,, obtained as an 
estimate of d.. . Compute the necessary elements of a matrix B, 
with one of the three points as the origin. Evaluate the (n — 3) 
third-order principal minors of B; . If these minors all vanish 
(approximately) this d., is probably a good estimate. 





C. To obtain projections of stimuli on axes. 
1. Construct D;, . 
dy=d..—hy. 
2. Construct B; with origin at any stimulus 7. 
bi. = 3(di, + di, — dis). 


3. Obtain from B; averages of 
a. Columns, 


b= tS by, 


n 


b. Rows, 


2s oe 


n 


c. And all elements, 


‘uae 3s... 


nk 


4. Construct matrix B* with origin at the centroid of stimuli. 


bt, = Dix bi sas b;. + b... 
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5. Factor B*, obtaining A*, , the matrix of projections of stimuli 7 on 
arbitrary axes m. 

6. Rotate and translate matrix A;,, to a meaningful set of dimensions if 
criteria for such are available. 


REFERENCES 
1. Gulliksen, Harold. Paired comparisons and the logic of measurement. Psychol. Rev. 


1946, 53, 199-213. 
2. Klingberg, F. L. Studies in measurement of the relations among sovereign states. 


Psychometrika, 1941, 6, 335-352. 
8. Richardson, M. W. Multidimensional psychophysics. Psychol. Bull., 1938, 35, 


659. (Abstract). 

4. Thurstone, L. L. Psychophysical analysis. Amer. J. Psychol., 1927, 38, 368-389. 

5. Young, G., and Householder, A. 8. Discussion of a set of points in terms of their 
mutual distances. Psychometrika, 1938, 3, 19-22. 


Manuscript received 5/24/52 


Revised manuscript received 7/14/52 














PSYCHOMETRIKA—VOL. 17, NO. 4 
DECEMBER, 1952 


THE AVERAGE SPEARMAN RANK CORRELATION 
COEFFICIENT* 


SAMUEL B. LYERLY 


WASHINGTON, D. C. 


A method is derived for finding the average Spearman rank correlation 
coefficient of N sets of ranks with a single dependent or criterion ranking of 
n items without computing any of the individual coefficients. Procedures 
for calculating the exact distribution of pa, for small values of N and n are 
described for the null case. The first four moments about zero of this dis- 
tribution are derived, and it is concluded that for samples as small as N = 4 
and n = 4 the normal distribution can be used safely in testing the hypoth- 
eSiS pag = 0. 


The Spearman rank correlation coefficient p (sometimes called the 
“rank difference” coefficient after one method of calculating it) has been in 
use among psychologists for about half a century (8). Recent researches 
by Hotelling and Pabst (2), Kendall and his co-workers (4, 5, 6, 7), and 
others has stimulated interest in rank correlation methods, largely because of 
their usefulness as non-parametric procedures—i.e., to provide tests of the 
null hypothesis in cases where the population distribution of either or both 
variables is unknown. Kendall’s recent book (7) contains an excellent 
summary of rank correlation, including p and his own alternative coefficient 
7, Which in some respects is superior to p. 

A short-cut method for computing the average of the intercorrelations 
of N ranked series each consisting of n items has been known for many years 
(3, 218). This method may be expressed in the following formula: 


aiaiies _, _ _ N(4n + 2) _ ae 
Average inter-p = 1 = te- Db + NN — Din? —n)’ (1) 





where 
N = number of sets of rankings, 
n number of ranks in each set, and 
S sum of rank numbers for a given object or stimulus. 


I] 


Kendall and Babington Smith discuss this formula and present the 
exact distribution of the mean intercorrelation in the null case for several 
small values of N and n (6; see also 7, Chs. 6 and 7). Kendall’s “coefficient of 


*This problem first came to the writer’s attention in discussions with Dr. Dean J. 
Clyde. 
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concordance,” W, is a simple function of the average rank intercorrelation 
so designed that W ranges from 0 to 1 as the degree of agreement among 
the sets of ranks ranges from no agreement at all to perfect agreement. 
Approximate tests of significance of W (and hence of the average inter- 
correlation) based upon the z (or F) distribution or upon x’ are suggested 
for use with larger values of N or n where the computation of the exact 
probabilities is excessively laborious. 

Occasionally there may arise a problem in which we are not interested 
in the average 7ntercorrelation of N sets of ranks, but we are concerned with 
the average correlation of N sets of ranks with a single dependent or cri- 
terion ranking. For example, we may ask N individuals to rank independently 
n objects of art according to their merit, and we may have as a criterion 
variable the rank-order listing of one or more experts. It is the purpose of 
this paper to derive a short method of calculating the average of the N p’s 
without computing each p individually, and to investigate the problem of 
the significance of the average p in the null case. 

We shall let y; (¢ = 1, 2, --- , ) be the criterion or dependent set of 
ranks and z;; (4 = 1,2, --- ,;j = 1, 2, ---, N) be the rank number assigned 
to the 7th stimulus by the jth individual. Then the square of the difference 
in ranks for a single judgment (y; — 2,;)” is equal to yi — 2y; 2;; + 2%;. 
The sum of these squares over a given 7 is 


N N N 
* (Ys — 4; 7 = Ny — 2y; > tis + D> vi; ? (2) 
i=1 i=1 i= 


and the corresponding sum of squares over the whole table is 


n 


YLw-alt= vv -2D(y Da) ty Vat. (3) 


s=1 j=1 t=1 j=1 i=l t=1 


Here both >> >> x? and N >> y’ are N times the sum of squares of the first n 
natural numbers and hence equal to Nn(n + 1)(2n + 1)/6. Thus (3) reduces 
to 
n N n N n N 
> > (ys 2:5)" = 2 > dai at 2 > (v, > 2) 


n 


_ 2Nn(n + en a y> (v, 2 wu) (4) 





and the average sum of squares of rank differences over the N individuals is 


n 


ont 22% (y. > 2) 


n 





1 ~ 2 _ 2nin+1 
Wy eM — ti) = 6 N re 
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Substituting this average value of the sum of squares of differences into 
the usual formula, p = 1— 6 >> d’/(n* — n), we have 


: of 2n(n + 1)(2n + 1)/6 — 2 p> (v, > 2.)/0 | 


n—n 





i=1 i=1 


n—1 N(n® — n) 





_2an+1) , x (v, > #) 


mm. (6) 


The following fictitious example will illustrate the computation of p,, : 























z 
y zz yrx 
1 2 3 4 
1 3 1 2 2 8 8 
2 i 2 3 1 7 14 
3 4 4 +f 3 15 45 
4 2 3 1 4 10 40 
Sums: 40 107 
Here VN = 4 and n = 4, and 
a, «p= Ey B® .. os 


3 4 X 60 


It may easily be verified that p,, is the product-moment correlation 
coefficient of all Nn pairs of criterion-judgment ranks. This is a consequence 
of the fact that the distributions of all subsamples (individuals’ judgments) 
are identical and the criterion distribution is likewise identical for all indi- 
viduals; hence the “total correlation” is equivalent to the mean correlation. 

The distribution of p,, in the null case for any values of N and n can be 
found exactly, although a considerable amount of arithmetic is involved if 
either N or n exceeds 7 or 8. Since p,, is the average of n p’s, we need only 
the distribution of p in the null case for the appropriate n, and then by ordi- 
nary combinatorial methods the distribution of means (or sums) of NV samples 
from this “population” can be found. Perhaps the most difficult part of the 
process is finding the initial distribution of p itself, although Kendall (5,7) 
has described methods of finding it and has tabulated the exact distributions 
of p (or of >. d’) for n = 3 ton = 8, inclusive. We shall illustrate the method 
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of finding the distribution of p,, for n = 3 and N = 2 by starting with the 
distribution of p for n = 3 as given by Kendall. 
The distribution of p in the null case for n = 3 is as follows: 











Relative 
p frequency 
1.00 l 
50 2 
.00 0 
— .50 2 
—1.00 1 





It should be noted that this is a discrete distribution and that only the 
listed values of p are possible, since the sum of squares of differences of 3 
pairs of integers can take only a limited number of values. It will also be 
noted that the distribution is symmetrical about zero. This will obviously 
be true of the average p in the null case. 

To find the distribution of p,, for N = 2, we merely calculate the dis- 
tribution of all possible combinations of 2 from the above table. A p,, of 
+ 1.00 may be obtained in only one way, i.e., when both single p’s are + 1.00. 
The next highest possible p,, is + .75, which is obtained when one p is + 1.00 
and the other is + .50. The relative frequency of this combination is 4, 
since the sequence + 1.00, + .50 can occur in two ways and so can the 
sequence + .50, + 1.00. The complete distribution may be obtained readily 
by constructing a table as follows: 











1 2 0 2 1 
1 1 2 0 2 1 
2 2 4 0 4 2 
0 0 0 0 0 0 
2 2 4 0 4 2 
1 l 2 0 2 1 
Sums: 1 4 4 4 10 4 4 4 1 


In this table the entries are all the possible products of pairs of elements of the 
original distribution. Fach row is displaced one space to the right of its 
predecessor so that the new distribution can be obtained by merely adding 
columns. The total frequency is (n!)*; and, tabulating the distribution in 
terms of p,, , we have: 
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Relative 

Piss frequency 
1.00 1 
75 4 
.50 4 
.25 4 
.00 10 
— .25 4 
— .50 4 
— .75 4 
—1.00 1 
Total 36 





For small values of n and N the above method for finding the distri- 
bution of p,, is quite feasible. For large n and N the method is cumbersome, 
largely because the initial distribution of p itself is difficult to obtain. In 
such circumstances it is natural to inquire whether a good approximation 
can be found which will be satisfactory. 

It is known that for large n the distribution of p itself approaches nor- 
mality, but it is not known precisely how large n should be in order to use 
the normal integral for testing purposes. Kendall (7), with some hesitation, 
suggests 20 as a minimum n for which the normal curve may safely be used, 
and proposes that p/(n — 2)/(1 — p’) , treated as “Student’s” ¢ with n — 2 
degrees of freedom, provides a better test in the range 8 < n < 20. (For 
n < 8 exact probabilities have been computed.) 

In the present case, where we have N sets of ranks, it is reasonable to 
suppose that the approach to normality should be fairly rapid. Since p,, is 
the mean of N values of p, and since p is approximately normally distributed, 
it would follow from the Central Limit Theorem that, for fixed n, the dis- 
tribution of p,, would approach the normal as N increases. 

The variance of p in the null case is 1/(n — 1). Its fourth moment 
about zero (2) is 





aol 3(25n*? — 38n? — 35n + 72) ; 
wail 25n(n + 1)(n — 1)° 





Here 8, = 0 (all odd moments are zero by virtue of the symmetry of the 
distribution), and 





a 24 (36 — 5n — 100") 
B= 3+ oe nv —n : 


which approaches the normal value of 3 as n increases. 
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In samples of N from such a population, we calculate from well-known 
theorems: 








a 
"Na -—1’ 
_ 3(25n® — 38n? — 35n + 72) 1 3(N — 1) 
Me O5Ne n(n + I(n — 1)? N*(n — 1)*? 
and hence 
ie 24 (36 — 5n — 191) 
re, ssa ( n—n 


Thus the distribution of p,, approaches normality, as judged from its 
first four moments, very rapidly as n and N increase. Table 1 lists 6, for 
certain small values of n and N. 

TABLE 1 
B2 for Small Values of n and N 














N 
n 
1 2 3 4 5 6 

2 1.00 2.00 2.33 2.50 2.60 2.67 
3 1.50 2.25 2.50 2.62 2.70 2.75 
4 1.85 2.42 2.62 2.71 2.77 2.81 
5 2.07 2.54 2.69 2.77 2.81 2.85 
6 2.23 2.61 2.74 2.81 2.84 2.87 
7 2.35 2.67 2.78 2.84 2.87 2.89 





From Table 1 it can readily be seen that the normal value of 3 for 6, 
is approximated very nearly for even small values of N and n. As a check 
on the usefulness of the normal distribution for testing the null hypothesis, 
the exact distribution of p,, for VN = 4 and n = 4 was calculated (these values 
are probably as low as most experimenters would ever need to use), and the 
.005, .010, .025, and .050 points determined both from the exact relative 
frequencies and by using the normal approximation based upon the variance 
1/N(n — 1). (The four significance points were chosen because they provide 
one-tail or two-tail tests of the null hypothesis at the 1% or 5% levels.) 
Table 2 lists the relative frequencies in the tail of this distribution. From 
Table 2 we see, incidentally, that the p,, of .35 obtained in the fictitious 
example above is not significant at the 5% level, since a value of .50 is re- 
quired for the one-tail test and .60 for the two-tail test. 
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TABLE 2 
A Portion of the Distribution of p,, for N = 4andn = 4 











Relative Cumulative % cumulative 

Pav frequency frequency frequency 
1.00 1 1 .000003 

95 12 13 .00004 

90 58 71 .0002 

85 160 231 001 

80 347 578 002 

75 704 1282 004 

70 1194 2476 007 

65 1852 4328 013 

60 2885 7213 022 

55 3968 11181 034 

50 5544 16725 050 

45 7196 23921 072 


ee ee 





Total Frequency = 331776 





Table 3 lists the significant values of p,, as calculated from the exact 
values in Table 2 and the significant values as estimated from the normal 
approximation. 


TABLE 3 
Significant Points of the Distribution of p,, for N = 4andn = 4 








Normal Approximation 
Significance Exact value 
level § By formula Next higher 
possible value 








.005 744 75 75 
.010 671 .70 .70 
.025 .566 .60 .60 
.050 475 50 -50 





From Table 3 we see that for N and n as low as 4, the use of the normal 
integral in testing the significance of p,, at the 1% or 5% levels, using either 
the one-tail or the two-tail test, results in the same decision with respect to 
accepting or rejecting the null hypothesis as the use of the exact distribu- 
tion. Since p,, approaches normality even more closely with larger N and n, 
and since most experimental problems would involve larger N or n or both, 
we are on safe ground in concluding that the normal curve approximation is 
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appropriate in testing the null hypothesis of a zero correlation in the popula- 
tion (or, more explicitly, of an average p of zero with the criterion in the 
population of sets of ranks from which the sample of sets was drawn). 

In the non-null case, i.e., when the population p,, is not zero, no exact 
significance test is known, since the distribution of p itself is unknown for 
such populations. Thus we cannot test the hypothesis that an observed p,, 
is a sample from a hypothetical population in which the average p is some 
value other than zero, nor can we make an exact test of the difference between 
two sample values of p,,. Such tests must a-vait the development of feasible 
methods of calculating or approximating the distribution of p for non-zero 
population values.* 
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*One of the pre-publication reviewers of this paper has pointed out that the normal 
deviate test of p,, is equivalent to a x? test of a linear relationship among the sums of 
ranks, Similar to Friedman’s x? (1), which he developed as a test of differences among the 
sums of ranks (and hence as a test of “‘concordance’’), a x, with one degree of freedom as 
a test of linearity reduces to p,,/op,, « 
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THE ORTHOGONAL APPROXIMATION OF AN OBLIQUE 
STRUCTURE IN FACTOR ANALYSIS 


Bert F. GREEN 


MASSACHUSETTS INSTITUTE OF TECHNOLOGY* 


A procedure is derived for obtaining an orthogonal transformation 
which most nearly transforms one given matrix into another given matrix, 
according to some least-squares criterion of fit. From this procedure, three 
analytic methods are derived for obtaining an orthogonal factor matrix 
which closely approximates a given oblique factor matrix. The case is con- 
sidered of approximating a specified subset of oblique vectors by orthogonal 
vectors. 


Introduction 


In factor analysis problems, it is common practice to rotate the factor 
matrix to positive manifold or simple structure by means of oblique trans- 
formations. Many of the standard rotation techniques use oblique rather 
than orthogonal rotations. On the other hand, although the use of correlated 
factors in factor analysis is currently widespread, there is not complete accord 
on the issue. Some workers prefer to use only orthogonal reference frames. 
In addition to personal preferences, there may be other situations in which 
it is desired to have the final rotated factors orthogonal. When the results 
of the analysis are to be used in further mathematical formulas, such as in 
estimating factor scores (6) (11, Ch. 21), or in deriving multiple regression 
weights (3) (7) (9), orthogonal factors represent a simplification. 

Thus the problem has arisen of finding a set of orthogonal reference 
vectors which closely approximate a given oblique structure. This paper 
presents three methods by which this orthogonal reference frame can be 
determined analytically. These methods are all special cases of the general 
problem of a “best-fitting’”’ orthogonal transformation. 

It is of course possible to obtain an orthogonal reference frame by 
rotating the oblique solution into an orthogonal frame, using standard 
graphical rotation procedures. This relies heavily on subjective judgment. 
One advantage of an analytic method over such a procedure is that inde- 
terminacy is eliminated—a single answer is always obtained. Also, for more 
than three or four factors, rotating to an orthogonal structure may involve a 


*Part of this research was carried out while the author was a psychometric fellow 
at the Educational] Testing Service, Princeton, New Jersey. 
nae {This problem was first brought to the attention of he author by Dr. Dorothy C. 
ins. 
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great deal of cutting and trying. On the other hand, if compact computational 
methods are used, the computation involved in the analytic methods is not 
overwhelming. It is the author’s opinion that the analytic methods will 
probably turn out to be more efficient than the “cut and try’ methods. In 
fact, one of the main purposes in considering the problem was to discover 
an efficient means of fitting an orthogonal transformation. 

A simple procedure for obtaining an almost orthogonal transformation 
has recently been proposed by Gibson (5). He suggests that in many cases 
the normalized centroids of corresponding reference and primary axes will 
form a set of vectors that are almost mutually orthogonal. This is certainly 
a very useful device and will be sufficient for some problems. However, it 
will seldom yield exact orthogonality. If precision is required, additional 
graphical rotations will be necessary. Also, there are many configurations 
in which the proposed set of centroids will not approach orthogonality. In 
such cases more complex methods are necessary. 


General Problem 


Solution for the General Problem 


Let A and B be k X m matrices, with k > m, such that A’B is of rank 
m, and consider the problem of finding the m X m orthogonal transformation 
A (A’A = I) which transforms A into B. Since in general it will not be possible 


to find a A which satisfies the equation 
AA = B, (1) 


it is necessary to find an orthogonal transformation A such that the equation 
is satisfied as nearly as possible; that is, so that AA is as close as possible to 
B, in some sense. 

One definition of ‘‘closest’’ is the minimum of the sum of squared devia- 
tions. In this sense, it is required to find the A which minimizes the sum of 
squares of elements in the difference matrix (AA — B). 

Mosier (8) considered a similar problem where A is not restricted to an 
orthogonal matrix, but may be any non-singular transformation. He also 
dealt with the problem where the column vectors composing the transfor- 
mation matrix had unit length. In the problem considered in this paper, 
it is required that the vectors be of unit length and that the scalar product 
between any two different vectors be zero. 





Let 
A = || a,, ||; A, = qth column of A; 
B = || 6,. ||; B, = sth column of B; 
A = |/A,. ||; A, = sth column of A; 





8 = || 4% ||; 0, = gth row of 9; 
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where i = 1, 2,--- , kj g, 8s, h = 1,2, +--+ ,m;k > m. It is specified that 
A’B be of rank m. The conditional equation on A is 


AA’ = A’A = I. (2) 


Using the method of Lagrange to impose condition (2), we minimize the 
function 


m k 


m 2 m m 
a= > | Dd. Aigdos — ».) +> an( NopAay — fos (3) 

s=1 i=1 g=1 g,h=1 p= 
Here 6,, = 0,, is a Lagrangian multiplier; 6,, is the Kronecker delta (6,, = 
0, for g # h; 6,, = 1); the factor of 2 is introduced for convenience only. 
The partial derivative of f with respect to d,, is 


k 


Of = + i ( ps 2X; aia! be + D3 BanNne ° (4) 


Orgs i=l 


Equating (4) to zero and using matrix notation we have 


A((AA, — B,) + 9,A, = 0. (5) 
The complete set of equations is then 
A’(AA — B) + 0A = 0. (6) 
This may be rewritten as 
(A’A + O)A = A’B. (7) 
The transpose of (7) is 
A’(A’A + 0) = BA. (8) 
Postmultiplying (7) by (8) and using the conditional equation (2), we get 
(AA + 0)? = A'BB’A. (9) 
Taking the square root of each side of (9), we have 
(A’A + 0) = (A’BB’ A). (10) 
Substituting (10) into (7) and solving for A, we have 
A = (A’BB’A)*A’B. (11) 


Consider (A’BB’A)™*. The fractional exponent is defined as follows: 
Y = X? if and only if Y’ = X; by X~* is meant Y~'. In general Y is not 
unique. Now for any square symmetric matrix of rank equal to its order, 
such as A’BB’A, it is possible to find an orthogonal matrix P and a diagonal 
matrix D such that 


A’BB’A = PDP’. (12) 
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The columns of P are the latent vectors (principal axes) of A’BB’A, and the 
diagonal elements in D are the corresponding latent roots (1, 73). Since 
A’B is of rank m and of order m X m, A’BB’A is of rank m, and all roots are 
non-zero. Furthermore, since the product of a real matrix by its transpose 
yields a Gramian matrix, all roots are positive. If the latent roots are dis- 
tinct, then P is unique except for the order of its columns. In areal symmetric 
matrix, the occurrence of equal roots introduces some indeterminacy in the 
matrix P. It is however always possible to find a P which satisfies (12). It 
can be shown that all P’s which satisfy (12) will yield identical A’s. 


From (12) we have simply 


(A’BB’ A)! = PD'P’, (13) 
To verify this, note that 

PD'P'PD'P’ = PDP’. (14) 
It follows that 

(A’BB’ A) * = PDP’, (15) 


Since either the positive or negative square root may be taken for each 
diagonal element in D~?, there are 2” solutions represented by equation (15). 
It can be shown that the solution which maximizes (3) is obtained by taking 
all square roots as positive. 


Thus A is obtained in the following way: 


a. The latent roots and vectors of A’BB’A are determined. A simple, 
compact computational procedure for doing this has recently been 
developed by Bryan (2). Other methods may be found in Dwyer 
(4). 

b. These roots and vectors are turmed into the matrices D and P re- 
spectively, so that the root in the 7th diagonal cell of D has its cor- 
responding vector in the 7th column of P, i.e., so that equation (12) 
holds. 

c. D™* is computed by taking the reciprocals of the positive square 
roots of the corresponding diagonal elements in D. 

d. Equations (15) and (11) are used to determine A. 


Weights 

Arbitrary fixed positive weights w; may be introduced in the function 
to be maximized, equation (3). If we simply replace a;, by w,a* , and b,, 
by w,b* , the function would become 


m k m S m m m 
a aa > pa wi( b AF Noe i vs) | z, 2, a.( 2, NopArp — in). (16) 
e=1 i=1 g=1 g=1 hA=1 p=1 


Here we initially have the matrices A* and B*, and wish to find the orthogonal 
transformation which minimizes /*, (16). Let us define a k X k diagonal 
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matrix of weights, W, whose ith diagonal entry is w; . Then, in order to 
use the general solution, (11), we may simply define A and B by 


A= WA*, B= WB*. (17) 


Equation (11) may then be used to find the required orthogonal transforma- 
tion. It is perfectly acceptable to have some of the weights zero, as long as 
A and B, (17), satisfy the requirements specified in the general development; 
that is, A’B must be of rank m. 

Note that it is not possible to introduce weights w, in the present solu- 
tion. This would amount to weighting the various orthogonal vectors in A 
differentially. To do this would require a different solution from the one 
considered in this paper. 


Application to Factor Analysis 


Definitions for the Factor Analysis Case 


In the special factor analysis problem, it is required to find the orthogonal 
structure closest in some sense to a given oblique structure. ‘Closest’? may 
be defined in at least three different ways, each of which yields a solution which 
is a special case of the above general result. 


Let the following matrices be defined: 


F original factor matrix, with k tests (rows), and m factors (columns). 

V_ = final oblique factor matrix. 

G = oblique transformation. The various columns of G give the 
direction cosines of the various reference vectors with respect to 
the original uncorrelated factors. 

A = orthogonal transformation. A’A = I. 

H = matrix of primary axes. The various rows of H give the direction 
cosines of the various primary axes with respect to the original 
uncorrelated factors. 

D, = matrix whose elements are cosines of angles between primary 
axes and oblique reference vectors and therefore a diagonal matrix. 


The rth reference vector is the normal to the rth hyperplane; the rth primary 
axis is the intersection of the (m — 1) hyperplanes excluding the rth. The 
following relationships exist: 
R = FF’, where R is the correlation matrix with communalities in the 
diagonal, 


and 
HG = D,. 
Detailed discussions of oblique structure have been given by Tucker (10) 
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and Thurstone (11, Ch. 15), using slightly different notation. [Our @ is 
Tucker’s A and Thurstone’s A; our H is Tucker’s H and Thurstone’s 7; our 
D, is Tucker’s D and Thurstone’s D. Gibson (5) uses Thurstone’s notation.] 


First Method 


One criterion for selecting the orthogonal transformation is based on a 
consideration of the oblique factor matrix V and the approximating orthogonal 
factor matrix FA. We may require the orthogonal factor loadings of each 
test to be as nearly similar as possible to its oblique factor loadings, according 
to a criterion of minimum sum of squares. To do this, we would minimize 
the sum of squares of elements in the difference matrix (FA—V). Letting 
F be A and V be B in (11), we have 


A = (F’VV'F)?F'YV; (18) 


A = (F’FGG'F'F) *F'FG. (19) 


There is an alternative geometrical interpretation of this criterion of fit. 
A transformation of F may be viewed either as a rotation of axes with respect 
to the fixed points (vectors), or as a transformation of points with respect to 
the original fixed axes. In the latter interpretation, one could require that 
the distances between the points defined by FA and those defined by FG be 
as small as possible. For test 7, let the point defined by FA be e; with co- 
ordinates e;, and let the corresponding point defined by V be v; , with co- 
ordinates v;, . The square of the distance between points e; and 0; is 
j Poa (e;, — v;,). Minimizing the sum over j of these quantities is identical 
with minimizing the sum of squares of elements in the matrix (FA—V). 

This first method for obtaining A provides for least over-all change in 
the factor loadings, and is a natural criterion to consider. However, it is 
weighted by the particular tests in the battery, and would not be invariant 
for a different selection of tests. Also, it may be said in general that each 
test vector has a weight in rough proportion to its communality. Thus the 
tests with the most common factor variance play the largest role in de- 
termining the orthogonal structure. 

This solution is also affected by the distribution of test vectors in the 
factor space. One way of viewing the criterion of fit used in this method is 
that for each test vector, the angle it makes with each new orthogonal axis 
is as nearly as possible equal to the angle it makes with the corresponding 
oblique reference axis. It is clear from this statement that nothing explicit 
is said about the angles between the reference axes and the orthogonal axes. 
These angles are determined by the distribution of test vectors. In most 
cases the tendency will be for these angles to be small. However, if many 
tests are grouped together in some part of the configuration, peculiar results 
may be obtained. If most of the test vectors are grouped in the center of 
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the configuration with only a few vectors describing the corners of the struc- 
ture, then there might be large changes in the loadings of the tests in the 
corner, in order to keep the center group relatively unaffected. 

It may be noted that a criterion of fit very similar to this first method 
would consider the descriptions of the tests in terms of linear combinations 
of primary axes. These linear combinations are given by VD;'. Thus, 
one might require the orthogonal factor loadings to differ as little as possible 
from these linear coefficients. To do this, V would be replaced by VD;" in 
equation (18). 

A = (F’VD;?V'F)?F’VD;". (20) 


If all the diagonal entries in D, are equal, equation (20) reduces to 
equation (18). 


Second Method 

The difficulties of the first method of determining A indicate that it 
would be desirable to have a method which did not depend on the particular 
distribution of test vectors in the configuration. One such method is obtained 
by requiring the new orthogonal axes to be as close as possible to the oblique 
reference axes defined by G. 

In this case, “close”? may be defined by maximizing the squares of the 
cosines of angles between corresponding axes and simultaneously minimizing 
the squares of cosines between non-corresponding axes. The intercosines 
are given by G’A. Thus the function to be minimized is the sum of squares 
of elements in (G’A — J). From (11), 


A = (G@’)'4. (21) 
It should be noted that this is equivalent to maximizing the sum of the 
cosines of angles between corresponding axes, with no attention given to the 


angles between non-corresponding axes. In the preceding paragraph we 
minimized 


f= DE( Loa. 3.) +4, (22) 


p=1 @=1 s=1 


where 6 symbolizes the Lagrangian conditions, G = || g,, ||, A = || Ag» {I; 
and 6,, is the Kronecker delta. Expanding (22) we get 


f ad z om z, GiB idie —2 a z. bn. +2 Jsqrsp + m + 6, (23) 
where all summations are from 1 to m. This reduces to 
f= DS LL Ie — 2 DS De Geer + mM + 8. (24) 


Thus minimizing f with respect to \,, is the same as maximizing >>, >>. JeaAve » 
which is the sum of cosines between corresponding axes. 
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Third Method 


Instead of requiring the orthogonal axes to be as close as possible to the 
reference axes, one can require them to be close to the primary axes. In this 
method we use the same definition of closeness that was used in the second 
method, with primary axes replacing reference axes. To obtain the solution 
under this criterion, H’ is substituted for G in (21), which gives 


A = (H’H)'H’. (25) 


Again it may be demonstrated that this result is equivalent to maximizing 
the sum of cosines between corresponding orthogonal and primary axes. 

The author has not been able to find any simple analytic relationship 
between the second and third methods of solution. It is possible to show 
that if all diagonal entries in D, are equal, that is, if all cosines of angles 
between corresponding reference vectors and primary axes are equal, then the 
solutions are identical. In this case, equation (25) reduces to equation (21). 
Since these cosines are usually quite similar, it seems likely that equations 
(21) and (25) will yield very similar results in most cases. 


Weights 


If for some reason it is desired to introduce differential weights, and if 
there is some analytic or subjective method for determining the weights, then 
the methods of section 3 may be used. In the first method of determining 
A in the factor analysis problem, we may weight the various tests differentially. 
Then, in the general solution, equation (11), A = WF, B = WV. In the 
second method, we may assign differential weights to particular reference 
vectors. In this case, A = WG’, B = W. In the third solution we may 
assign differential weights to different primary axes. In this case, A = WH; 
B= W. 


The case of a subset of fixed vectors 


So far we have been considering a problem with m factors for which an 
m-dimensional oblique structure has been determined. Methods have been 
developed for obtaining a set of m mutually orthogonal vectors which were 
closest to the oblique structure according to some least-squares criterion. 
In some applications it may be desired to fix some of the transformation 
vectors in advance, on the basis of some other criteria, and to determine only 
the remaining vectors by the methods of this paper. 

In an oblique structure, some of the vectors may already be orthogonal 
among themselves and it may be desired not to change them. In other 
situations, there might be some oblique vectors which are so well determined 
by the configuration that it is decided to leave them unaltered and to use 
the orthogonal approximation only for the remaining vectors. In either 
case, r linearly independent vectors are fixed, thus determining a subspace 
of r dimensions, and it is required to find a set of orthogonal vectors which 
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span the complementary (m — r) space. In either case, the methods of this 
paper could be used in the (m — r) space orthogonal to the r fixed vectors. 
It is only necessary to assume that the fixed vectors correspond to particular 
vectors or factors in the structure, so that the (m — r) orthogonal vectors to 
be determined match a particular set of (m — r) factors or axes in the original 
structure. It is then possible to define the least-squares criterion in the re- 
duced (m — r) space, and to use the general results presented above. 

Let us partition the transformation matrix A into the r fixed column 
vectors A; , and the (m — r) unknown column vectors A, . Thus 

A = || Ay! A, |]. 
Here the restrictions are that Aj A, = 0, and Af A, = J. A is not restricted 
to an orthogonal matrix since we do not need AjA, = 0. The matrix V 
is also partitioned into the r columns V, corresponding to A, , and to the 
(m — r) columns V, corresponding to A,. Likewise we have G, and G, , and 
Hi and H/,. 

It is first necessary to project the structure onto the (m — r) space 
which is orthogonal to the r fixed vectors. To do that we may use any set of 
(m — r) orthogonal vectors denoted by the m X (m — r) matrix C, such that 

C'C =1; A;C = 0. (26) 
These vectors merely define the (m — r) subspace in which we will work. 

It is a perfectly straightforward task to find a C which satisfies the 
restrictions specified by equations (26). However, this may be an arduous 
task if more than two or three vectors are fixed. 

One method of obtaining C would be to determine each column suc- 
cessively. Calling the first column c¢, , we let the last (m — r — 1) elements 
of c, be zero. We then have A; c, = 0 as a set of homogeneous linear equa- 
tions to solve for c,. Then for c, , we let the last (m — r — 2) elements be 
zero, and solve the set of equations represented by || A;} ¢ ||’ co = 0. In 
this method it is necessary to compute inverses of order r X 7, 


+ DX e+ Tl, >, a= 1) X (m — 1). 

Another method requires only inverses of order (r X r) or less. First 
an auxiliary matrix, U, of the same order as C is constructed. The matrix 
|| Ay } U || is then broken up into submatrices according to the following 
schema: 





ee ee Se 
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Here, m = 3r'+ ¢, A is of order 1 X 7; A*{is of order ¢ X t, made up of 
any ¢ columns of A; X is of orderr X t; B,C, D, Y, and Z are of order r X r. 
(t<r). We have 
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Now orthogonality among blocks is obtained by writing the following equa- 
tions: 

AjU, = U2U, = U3U, = A’'A* + B’X = 0, 

AjU, = U3U, = A’'A + BB+ C'Y =0, 


and 


AjU, = A’‘A+BB+C'C+4+ D’Z =0. 
Solving for the unknown submatrices we have 
X = —(B’)"A’'A*, 
Y = —(C’)"'(A’A + B’B), 


and 

Z= -—(D')"(A’‘A+ BB+ CC). 
Next, for each U; we must find a corresponding 7’; such that (U,T,)’ U,T; 
= I. To do this we form U! U; and factor completely, by the diagonal 
method, yielding 


Il 


UU, = L,Li, 
where L; is triangular. Then T; = (L/)~*. (The inverse of a triangular 
matrix may be determined very swiftly by solving in succession the equations 
represented by TL’ = /.) 

Then we have 

C, = U,T, ; C, = U,T, ; C3 = UT. 
C = || C11 C21Cs II. 
This method is general—it works for any m and r. 

Now, in order to use the first criterion for determining A, (p. 434 above), 
we obtain FC. This gives the projections of the test vectors on the space 
defined by C. Using FC for A and V, for B in the general solution (11), we 
obtain Are , an orthogonal transformation of order (m — r) X (m — 1): 


Ape = (C'F'V.ViFC)3C'FYV, . (27) 
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Finally, 
A, = CApc . (28) 
To use the second method, (p. 435) we obtain GC, which is used in place 
of @’ in (21) to obtain Age . 


Mec = (C’G.G.C)*C'G, . (29) 

From this we have 
Ay = CAgc - (30) 
To use the third method of solution, (p. 436) we obtain H,C which is used 

in place of H in (25) to obtain Aue . 

Aue = (C’HiH.C)*C'H. . (31) 

From this, we have 
A, = CAge . (32) 


A special case results when G, (or H,) is entirely within the (m — r) 
subspace defined by C, that is, when the column vectors of G, and those of 
C span the same space. In this case, since the subspace is defined by G, , it 
is not necessary to determine C; (30) reduces to 


A, = G(G/G,)7*. (33) 


This case is equivalent to the case in which A, is disregarded. (Either A/G, = 0 
and doesn’t concern us, or A, is simply disregarded.) Here the function 
maximized is the sum of cosines of angles between corresponding column 
vectors in A, and G, , subject to the restruction that Ai A, = J. Equation 
(33) can be derived directly from the maximization, or from (30). 
When A, = A, and G, = G, it can be shown that (33) is equivalent to (21). 
For H,, , we have in the special case, 


A, = HUH)". (34) 
Summary 
The problem has arisen in factor analysis of finding an orthogonal 


structure which approximates a given oblique structure. In order to solve this 
problem, a more general problem is considered; this is the problem of finding 
the orthogonal transformation which most nearly transforms one matrix into 
another, according to a least-squares criterion of fit. This general solution is 
represented by equation (11), with equation (15) defining one of the factors 
in (11). Arbitrary weights may be introduced in this general solution. For 
the factor analysis problem three analytic methods of determining the re- 
quired orthogonal transformation are considered. The first method minimizes 
the sum of squared differences between oblique and orthogonal factor load- 
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ings. This solution depends on the size and distribution of test vectors in the 
configuration. The second method maximizes sum of cosines between cor- 
responding reference and orthogonal axes. The third method maximizes the 
sum of cosines between corresponding primary and orthogonal axes. Arbi- 
trary weights may be introduced in each method. This case is considered in 
which some of the axes are fixed in advance and the others are to be deter- 
mined by the analytic methods developed in the paper. 

Which of the methods is to be used in any problem depends on which 
criterion of closeness of fit is chosen by the investigator as being most ap- 
propriate for the particular problem. 
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BOOK REVIEW 


Applied Statistics. A Journal of the Royal Statistical Society. L. H. C. Tiprerr, editor, 
Vol. 1, No. 1, March 1952. Oliver and Boyd, Ltd. London: 98 Great Russell Street, 
W. C. Single number 10s; Annual subscription 25s. 


Control charts make their most important contribution in the field of personal 
relationships; in Great Britain, however, qualitity control (but not necessarily the use 
of statistics in industry) suffered a setback after the war; if x, is the reciprocal of the 
infant mortality rate per 1000, z2 a logarithmic measure of “hereditaments” per head, 
Zs illegitimacy rate per 1000 “related live births,” y the percentage of households where 
the weekly basic wage of the senior wage earner is at least £ 7 10s, in a given town, then 


y= 0.53 m1 + 0.67 Ze + 0.21 x3 ’ 


and a multiple correlation coefficient of .90 is found; the proportion voting conservative 
is not a good predictor; but the proportion of registered electors liable to jury service 
correlates well (from .6 to .8 or more) with the proportions in the “upper income group”’ in 
an administrative district; a factor analytic study shows Communists and Fascists to 
be alike very “tough-minded” though at opposite poles of a conservatism-radicalism 
scale, whereas Liberals lie in the middle of the latter scale but are extremely ‘“‘tender- 
minded.” 

One should not infer, from the above list of oddments, that the new periodical Applied 
Statistics is intended merely as a compendium of miscellaneous information. Nevertheless, 
a heterogeneous assortment of topics is inevitable: The editor calls for papers of interest 
to “economists and social scientists, medical scientists and agricultural scientists, chemists 
and physicists, engineers and technologists.’’ While psychologists are not mentioned as such, 
perhaps they are included as a species of the genus social scientist. In fact, this first issue 
does contain an interesting article by Hans J. Eysenck entitled ‘Uses and Abuses of Factor 
Analysis.” At any rate, the diversity of interests to which Mr. Tippett hopes the journal 
will appeal is a reflection of the universality of statistical concepts. 

Nevertheless, this universality is the universality of an abstraction, of mathematics, 
in fact, whereas Applied Statistics is to avoid mathematics. In the entire issue there are 
two pages devoted to a “mathematical derivation,” besides three linear equations: the 
one given above, another in four, and one in two independent variables. The aim, as stated 
in a foreword by A. Bradford Hill, “is to present, in one way or another but always simply 
and clearly, the statistical approach and its value, and to illustrate in original articles 
modern statistical methods in their everyday applications.’ It is possible that in the en- 
deavor to interest so many, the journal might prove to be of interest to none. 

Beyond a doubt, Mr. Tippett is aware of the difficulty, and the first issue, at least, 
seems to avoid it very successfully. Though the sample is scarcely random, one can perhaps 
draw an inference, whatever may be the level of confidence. In addition to the articles, 
the Foreword, and an editorial, there are “(Questions and Answers” (the editor promises 
to have the questions ‘answered by people competent to deal with the subjects raised’), 
“Notes and Comments,” a book review, and reports and abstracts of addresses from the 
Industrial Applications Section of the Royal Statistical Society and from the Study Section 
of the Royal Statistical Society. 
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It seems appropriate to close this review with mention of the last of the abstracts. 
Mr. R. J. E. Silvey discussed a “‘survey of the structure of the television public and the 
effects of television on leisure.” In households possessing TV, “Records were obtained 
of the behavior during one day of each individual.’ ‘“‘A control sample was obtained by 
interviewing persons in households without television sets in the immediate vicinity. 
The findings are due to be published in the April issue of the BBC Quarterly.” 


Oak Ridge National Laboratory A. S. Householder 
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